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Executive  Summary 


Improvised  Explosive  Devices  (lEDs),  in  different  forms,  are  a  favorite  weapon  for  global  in¬ 
surgents  and  terrorists,  with  an  average  of  260 IED  incidents  per  month  in  Afghanistan  and  Iraq 
for  the  time  from  January  2004  until  May  2010.  Terrorist’s  ability  to  quickly  change  the  IED 
type  requires  a  flexible  approach  in  IED  detection  training.  To  create  an  effective  training  tool, 
an  image  taken  in  the  field  must  be  segmented  into  two  sections,  separating  the  IED  and  the 
image  background. 

Different  from  objects  typically  used  to  evaluate  the  quality  of  segmentation  algorithms,  IEDs 
are  not  easy  to  detect  even  for  humans.  They  are  purposefully  hard  to  detect,  concealed  or 
colored  very  similar  to  the  background.  This  thesis  evaluates  existing  algorithms  for  image 
segmentation  and  creates  new  algorithms  to  pre-process  colored  images  in  order  to  segment 
IEDs. 

We  first  modify  a  basic  graph  cut  algorithm  to  segment  completely  visible  IEDs  from  a  uni¬ 
formly  colored  background.  To  measure  the  performance  depending  on  the  parameter  settings, 
we  run  a  series  of  tests  defined  by  a  Nearly  Orthogonal  Latin  Hypercube  design  of  experiment. 

A  second  series  of  tests  shows  that  the  basic  approach  does  not  result  in  good  segmentations  if 
applied  to  realistic  IEDs.  We  identify  two  major  issues  that  are  due  the  special  nature  if  IEDs: 


•  Different  from  other  segmentation  tasks,  real-world  IEDs  are  only  a  small  fraction  of  the 
complete  image.  This  results  in  a  training  area  for  the  background  model  that  is  extremely 
large  compare  to  the  training  area  of  the  IED. 

•  IEDs  do  not  necessary  appear  as  a  single  object.  Often,  charges  and  initiation  systems 
appear  -if  visible-  as  different  objects.  Visible  parts  of  an  IEDs  can  be  small  and  partially 
hidden.  Therefore,  a  single,  manually  provided  training  area  of  an  IED  is  not  precise 
enough. 


We  address  the  increased  difficulty  while  segmenting  IEDs  with  three  key  contributions: 

1.  Our  algorithm  automatically  divides  a  user-defined  background  area  into  smaller  areas. 
We  generate  separate  color  models  for  each  of  these  areas,  to  ensures  that  a  color  model 
includes  only  colors  that  appear  close  together  in  the  background.  For  each  of  these  areas, 
we  build  a  color  model  that  represents  the  complete  RGB  color  space  of  the  area. 

2.  We  compress  each  of  these  complex  color  models  into  a  statistical  model  by  applying  the 
k-means  Clustering  algorithm  for  up  to  10  clusters.  This  compressed  statistical  descrip¬ 
tion  increases  the  number  of  background  models  we  can  hold  simultaneously  in  working 
memory. 


xix 


3.  We  use  the  estimated  distance  to  the  background  to  estimate  the  initial  object  label  auto¬ 
matically.  In  this  way,  we  can  segment  an  IED  without  user  input  labeling  the  IED. 


The  results  of  this  thesis  have  three  different  applications:  1)  defining  the  IED  on  an  image,  2) 
generating  new  training  images,  3)  helping  humans  detect  IEDs. 

In  order  to  distinguish  between  a  student’s  detection  and  misdetection  of  an  IED  in  a  colored 
image,  the  area  of  an  IED  must  be  defined.  Our  work  defines  this  area  through  the  set  of  pixels 
at  the  end  of  the  graph  cut  algorithm. 

To  generate  new  training  images,  a  segmented  IED  can  be  combined  with  images  showing  a 
different  background.  Our  work  facilitates  this  through  precise  IED  segmentation. 

Finally,  the  automated  IED  labels  we  generate  may  be  helpful  as  a  hybrid  between  a  computer 
pre-processing  the  image,  and  a  human  operator  providing  the  final  identification.  The  auto¬ 
matic  label  computation  is  efficient  and  can  be  computed  in  real-time. 
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CHAPTER  1: 

Introduction 


1.1  Motivation  of  Our  Study 

1.1.1  Background  and  Area  of  Research 

Improvised  Explosive  Devices  (IEDs),  in  different  forms,  are  a  favorite  weapon  for  global  insur¬ 
gents  and  terrorists,  with  an  average  of  260  IED  incidents  per  month  in  Afghanistan  and  Iraq. 
The  use  of  IEDs  has  become  a  generic  global  threat  [1],  [2].  IEDs  consist  mainly  of  charges 
and  initiation  systems.  The  charges  are  often  available  from  past  armed  conflicts,  and  the  ini¬ 
tiation  systems  are  often  commercial  detonators,  readily  available  for  legitimate  purposes  and 
often  mass-produced  [2].  Therefore,  cheap  and  easy  to  make,  IEDs  have  become  the  weapon  of 
choice  in  unconventional  warfare  [3]. 

IED  detection  has  two  forms:  the  automatic  detection  with  specialized  equipment  and  manual 
detection  by  trained  individuals.  These  two  methods  of  detection  are  noncompetitive,  and  both 
serve  to  decrease  the  threat  caused  by  IEDs. 

Terrorists’  ability  to  quickly  change  the  IED  type  requires  a  flexible  approach  in  IED  detection 
training.  The  Training  and  Doctrine  Command  (TRADOC)  Analysis  Center-Monterey  was 
asked  to  support  Future  Warrior  Technology  Integration  (FWTI)  by  analyzing  and  designing  a 
flexible  application  in  the  training  for  IED  detection.  This  application  is  called  the  Perceptual 
Learning  Trainer  (PLT). 

Actual  photographs  of  IEDs  and  their  surroundings  are  required  to  generate  images  for  IED 
detection  training  and  to  generate  templates  for  automatic  IED  detection.  To  create  an  effective 
training  tool,  an  image  taken  in  the  field  must  be  segmented  into  two  separate  sections,  identi¬ 
fying  the  IED  and  the  image  background.  This  thesis  focuses  on  analyzing  image  segmentation 
algorithms  to  perform  this  task. 

1.1.2  The  Importance  of  Cutting  IEDs  from  a  Photograph 

The  following  scenarios  explain  the  utility  of  separating  an  IED  from  the  background,  given  an 
original  combined  image  as  input. 
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1.  An  instructor  wants  to  train  individuals  to  detect  IEDs  in  real-world  photographs: 

A  photograph  of  an  IED  and  surroundings  usually  consists  of  three  regions:  the  actual 
IED,  an  unusual  region  around  the  IED,  and  the  background.  To  distinguish  between 
a  student’s  detection  and  misdetection,  the  instructor  requires  an  automatic  method  of 
defining  these  regions. 

2.  An  instructor  wants  to  create  new  training  images  in  advance: 

IED  types  used  in  Iraq  also  tend  to  be  used  in  Afghanistan  after  a  certain  amount  of  time. 
Clay  Wilson,  Specialist  in  Technology  and  National  Security  Foreign  Affairs,  Defense, 
and  Trade  Division,  said  in  the  CRS  Report  for  Congress  [4]  that:  "'Observers  have  indi¬ 
cated  that  the  Taliban  forces  in  Afghanistan  appear  to  have  learned  some  IED  techniques 
from  the  Iraqi  insurgents.”  Therefore,  combining  IEDs  detected  in  Iraq  and  a  background 
from  Afghanistan  could  be  helpful  in  generating  flexible  training.  One  way  of  doing  this  is 
to  create  a  knowledge  base  of  IED  types  and  their  appearance.  Since  the  new  background 
could  be  different  than  the  original  background,  the  IED  image  has  to  be  segmented  from 
the  background  image  so  that  there  are  as  few  as  possible  background  pixels  left  on  the 
IED  image. 

3.  IED  images  are  used  to  automatically  identify  if  an  IED  is  present  in  an  image: 

General  approaches  for  object  recognition  based  on  templates  ( Pattern  Matching)  use 
algorithms  that  require  an  IED  template  to  be  extracted  from  known  IED  images.  The 
IED  images  have  to  be  segmented  to  remove  the  background  before  the  template  can  be 
generated. 

1.1.3  The  Difficulty  of  Cutting  IEDs  from  a  Photograph 

The  main  part  of  this  thesis  is  evaluating  existing  algorithms  for  image  segmentation  and  create 
new  algorithms  to  meet  the  special  requirements  of  separating  IEDs  and  their  background.  Fig¬ 
ure  1.1  shows  some  colored  images  [5]  typically  used  to  evaluate  the  quality  of  segmentation 
algorithms. 

Note  that  all  objects  shown  in  Figure  1.1  are  completely  visible  and  colored  differently  than 
the  background.  The  edges  are  connected  and  define  a  closed  surface.  In  contrast,  Figure  1.2 
shows  real-life  examples  of  IEDs.  The  biggest  difference  is  that  IEDs  are  not  easy  to  detect 
even  for  humans.  They  are  purposefully  hard  to  detect,  concealed  or  colored  very  similar  to  the 
background.  Therefore  it  is  not  clear  how  well  existing  algorithms  perform,  if  they  are  used  to 
cut  IEDs. 
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Figure  1.1:  Examples  of  images  similar  to  the  images  which  were  used  to  test  the  performance  of  GrabCut.  The 
examples  are  part  of  the  Berkeley  Segmentation  Data  Set  and  Benchmarks  500.  Typically  the  object  is  of  a  different 
color  than  the  background  and  not  concealed. 


Figure  1.2:  Examples  of  lEDs  provided  by  TRADOC.  The  lEDs  in  the  second  row  appear  to  be  closed  and  mono- 
tonically  colored.  The  boundaries  are  completely  visible  and  in  a  good  contrast  to  the  background.  The  lEDs  shown 
in  the  first  row  are  partially  hidden,  so  they  do  not  appear  as  one  closed  surface. 


1.2  Problem  Terminology 

1.2.1  Introducing  Basic  Terms 

The  following  explained  terms  are  used  in  this  thesis. 


•  A  colored  image  is  the  complete  picture  showing  the  IED  and  the  surroundings  before 
any  processing  on  the  image  takes  place.  This  image  could  be  taken  by  a  military  patrol 
after  detecting  an  IED. 

•  A  training  image  is  a  colored  image  that  shows  the  defined  region  of  the  IED,  the  accep¬ 
tance  region  and  the  background.  A  typical  training  image  is  shown  in  Figure  1.3. 


3 


Figure  1 .3:  Example  of  a  Training  Image.  The  red  polygon  surrounds  the  region  of  the  IED,  and  the  yellow  polygon 
surrounds  the  related  acceptance  region.  All  other  pixels  belong  to  the  background  of  this  image. 

•  A  generated  training  image  is  a  training  image  which  was  generated  by  combining  an 
IED  image  and  a  background  image.  The  source  of  a  generated  training  image  is  not  an 
actual  colored  image.  As  in  a  Training  Image,  in  a  generated  Training  Image  the  area  of 
the  IED,  the  acceptance  region  and  the  background  are  defined. 

•  Image  segmentation  is  the  process  of  partitioning  a  digital  image  into  multiple  segments. 
The  goal  of  segmentation  is  to  simplify  and/or  change  the  representation  of  an  image  into 
something  more  meaningful  and  easier  to  analyze  [6].  Image  segmentation  is  typically 
used  to  locate  objects  and  boundaries  in  images. 

•  An  object  in  image  segmentation  refers  to  a  region  in  the  colored  image  which  is  of 
particular  interest.  In  general,  this  could  be  any  kind  of  object  such  as  a  person,  a  flower, 
or  an  animal.  In  this  thesis  the  object  refers  always  to  an  IED. 

•  Interactive  image  segmentation  allows  the  user  to  approximately  pre-define  the  pixels 
which  belong  to  the  object  or  the  background.  Some  of  the  algorithms  we  evaluate 
use  interactive  image  segmentation  to  find  a  good  balance  between  user  interactions  and 
achieved  segmentation  results.  In  the  case  of  an  object  that  is  easy  to  separate  from  the 
background,  the  object  pixels  need  to  be  only  roughly  defined  to  get  a  good  segmentation 
result.  If  the  result  is  not  good  enough,  it  can  be  improved  by  adding  or  excluding  more 
object  pixels.  The  amount  of  user  interaction  strongly  depends  on  the  algorithm  used  and 
the  difficulty  of  the  segmentation  task. 

•  An  acceptance  region  is  the  unusual  region  around  the  IED.  This  set  of  pixels  of  the 
colored  image  is  not  part  of  the  IED  but  indicates  that  an  IED  might  be  present.  Figure 
1.3  shows  such  a  region  surrounded  by  a  yellow  polygon. 
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•  A  background  is  the  the  region  which  does  not  belong  to  either  the  IED  or  the  acceptance 
region.  Anything  that  is  not  “object”  or  “acceptance  region”  we  call  “background.” 

•  Labeling  in  our  work  defines  a  process,  where  either  the  user  manually  or  an  algorithm 
automatically  defines  initial  regions  in  the  image. 

•  Similarity  in  the  context  of  image  segmentation  measures  the  equality  among  different 
pixels.  This  measurement  can  be  based  on  different  criteria,  i.e.,  structure,  color,  texture, 
etc. 

•  Opposite  to  similarity,  discontinuity  measure  how  different  pixels  are.  In  image  seg¬ 
mentation,  a  high  discontinuity  among  pixels  usually  means  that  they  belong  to  different 
regions. 


1.3  Research  Objectives 

1.3.1  Scope  and  Objectives 

Our  goal  is  to  find  a  general  approach  for  separating  an  IED  from  the  background  given  a 
colored  image.  This  objective  must  be  achievable  by  soldiers  without  any  special  IT  background 
by  using  their  regular  equipment.  Another  restriction  of  the  problem  is,  that  the  segmentation 
has  to  be  precise  enough  to  allow  generating  a  new  image  by  combining  the  segmented  IED  and 
a  different  background. 

Practically,  our  goal  allows  for  the  following  sequence  of  actions: 

1.  A  military  patrol  detects  an  IED  and  takes  a  picture  of  the  IED  and  the  scene. 

2.  An  application  developed  in  this  thesis  segments  the  image  into  IED  and  background. 

3.  The  segmented  image  is  either  directly  used  as  a  training  image  in  IED  detection  training, 
or  to  generate  new  training  images. 

All  segmentation  algorithms  we  found  require  object  labels  either  as  initial  input  or  as  iterative 
input  to  improve  the  result  of  the  segmentation  task.  Because  of  the  special  circumstances 
while  being  on  a  military  patrol  and  because  IEDs  are  often  partially  hidden,  it  is  generally  not 
possible  to  provide  object  labels  manually.  Therefore,  we  design  an  algorithm  that  does  not 
require  the  user  to  define  the  initial  object  pixels  manually.  Similar  to  the  “GrabCut”  [7],  we 
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reduce  the  effort  a  user  has  to  invest  to  achieve  a  good  segmentation.  But  instead  of  decreasing 
the  quantity  of  the  user  interactions,  we  provide  a  technique  that  results  in  more  precise  initial 
object  labels. 

The  idea  of  our  stepwise  approach  is  to  first  generate  a  background  model  based  on  a  user- 
defined  background  area.  Based  on  this  model,  the  algorithm  highlights  all  unusual  objects 
that  do  not  fit  the  background  model.  The  result  of  the  second  step  is  an  overlay  similar  to  the 
second  image  in  Figure  1 .4.  The  color  codes  indicate  how  well  a  particular  pixel  fits  into  the 
background  model.  The  last  step  is  extracting  the  initial  object  pixels.  In  an  iterative  process, 
the  user  provides  a  threshold  for  the  gradient  between  blue  and  red.  The  set  of  all  pixels  above 
this  threshold  define  the  initial  object  label  as  shown  in  the  third  image  in  Figure  1 .4. 

To  verify  that  our  iteratively  generated  object  pixels  provide  a  better  baseline  for  the  actual  seg¬ 
mentation  than  manually  provided  object  pixels,  we  use  both  approaches  as  input  for  an  existing 
segmentation  algorithm  and  compare  the  segmentation  results.  For  our  purpose  a  segmentation 
result  is  considered  as  good  if  the  segmented  IED  can  be  used  to  generate  new  training  images. 

We  found  a  basic  algorithm  for  image  segmentation  introduced  by  Boykov  and  Jolly  2001  [8] 
and  several  more  advanced  algorithms  [7],  [9],  [10],  [11]  based  on  the  same  idea.  But,  to  clearly 
separate  the  impact  on  the  segmentation  results  achieved  by  using  our  estimated  object  pixels 
from  the  impact  of  using  a  more  advanced  algorithm,  we  use  the  basic  algorithm  introduced  by 
Boykov  and  Jolly  2001  [8]  for  testing. 

Out  of  scope: 

Approaches  discussed  in  the  thesis  try  to  estimate  object  pixels  as  a  baseline  for  the  actual  image 
segmentation.  The  set  of  all  estimated  object  pixels  generates  an  initial  object  label.  Once  a 
background  is  defined,  the  initial  object  labels  are  fast  to  calculate  and  can  be  used  to  highlight 
“unusual”  areas  on  an  image  as  shown  in  Figure  1.4.  Even  if  this  result  might  be  valuable  for 
human  operators  to  detect  IEDs,  detecting  IEDs  is  not  a  direct  objective  for  this  thesis.  In  the 
context  of  this  thesis,  detecting  unusual  areas  is  a  necessary  step  before  segmenting  an  image. 

This  thesis  is  not  creating  new  segmentation  algorithms.  Instead,  we  create  algorithms  to  pre- 
process  colored  images  in  order  to  enable  the  usage  of  existing  image  segmentation  algorithms 
for  IEDs. 
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Figure  1.4:  The  left  image  shows  a  real-world  example  of  an  IED  that  is  difficult  to  label.  The  image  in  the  middle 
shows  an  overlay  that  highlights  unusual  areas  on  a  image  and  the  right  image  shows  the  estimated  image  pixels 
that  are  used  to  perform  the  segmentation. 


1.3.2  Methodology 

On  an  abstract  level,  the  work  described  in  this  thesis  can  be  divided  in  four  main  parts.  Fol¬ 
lowing  a  short  explanation  about  these  four  working  packages. 

1.  To  verify  that  our  implementation  of  the  basic  approach  introduced  by  Boykov  and  Jolly 
2001  [8]  delivers  qualitative  similar  results  to  other  segmentation  algorithms,  we  run  a 
series  of  tests  using  colored  images  similar  to  the  left  image  in  Figure  1.5.  This  image 
does  not  show  a  realistic  example  of  an  IED  detected  by  a  military  patrols.  It  rather 
shows  an  example  of  an  object  used  to  test  the  performance  of  segmentation  algorithms 
in  a  non-military  context. 


Figure  1.5:  The  left  image  shows  a  colored  image  of  an  artillery-shell  found  in  Iraq  (image  published  by 
http://www.globalsecurity.org).  The  image  on  the  right  shows  a  real-world  example  of  an  IED  detected  by  a  mili¬ 
tary  patrol. 

2.  We  state  the  assumption  that  object  labels  cannot  be  provided  manually  for  most  of  the 
detected  IEDs.  To  test  this  assumption,  we  increase  the  grade  of  realism  by  applying  the 
Boykov  and  Jolly  algorithm  to  segment  colored  images  showing  real  world  examples  of 
IEDs.  The  right  image  in  Figure  1.5  shows  an  example  of  an  IED  in  a  realistic  scenario. 
Since  the  background  covers  parts  of  the  IED,  it  is  not  possible  to  draw  a  single  area  that 
includes  all  colors  of  the  IED  without  including  some  background  colors  as  well.  Also, 
defining  separate  areas  and  combining  them  requires  precision  that  is  not  present  while 
being  on  a  military  patrol.  Test  series  with  a  variety  of  manually  provided  object  labels 
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and  different  realistic  IEDs  show  that  segmentation  cannot  be  performed  if  the  initial 
object  pixels  are  not  defined  correctly. 

3.  Instead  of  asking  a  user  to  label  both  object  and  background  prior  to  segmentation,  our 
approach  asks  the  user  simply  to  provide  a  sample  of  the  background.  This  background 
sample  can  even  be  provided  using  an  entirely  different  image  than  the  one  intended  for 
segmentation.  Our  algorithm  then  estimates  the  initial  object  label  of  the  image  intended 
for  segmentation  without  further  user  input.  This  preprocessing  enables  the  usage  of 
existing  image  segmentation  algorithms  for  IEDs.  This  part  defines  the  main  part  of  our 
thesis. 

4.  We  use  the  basic  segmentation  algorithm  of  Boykov  and  Jolly  2001  [8]  and  run  a  series 
of  tests  using  the  same  parameters  as  before,  but  with  the  automatically  defined  initial 
object  pixels  instead  of  the  manually  defined  areas.  The  difference  between  the  results 
of  the  test  series  using  the  manually  provided  object  labels  and  the  test  series  using  the 
automatically  generated  object  labels,  is  used  to  measure  the  goodness  of  our  algorithm. 


1.3.3  Benefit  of  the  Study 

Search-and-target-acquisition  research  benefits  all  soldiers.  The  ability  to  detect  IEDs  in  com¬ 
plex  scenes  is  a  valuable  skill,  especially  in  today’s  operating  environment.  Determining  ways 
to  use  colored  images  showing  recently  detected  IEDs  and  generating  new  training  images  by 
combining  IEDs  and  backgrounds  expands  the  training-tool  set  for  today’s  commander.  Dis¬ 
tributing  this  training  across  a  variety  of  computer  platforms  promotes  more  accessible  training 
for  all  soldiers,  both  in  garrison  and  while  deployed. 

1.4  Organization  of  the  Thesis 

Chapter  2  provides  an  overview  of  approaches  for  image  segmentation.  Chapter  3  describes  the 
family  of  algorithms  for  image  segmentation  we  are  interested  in.  In  Chapter  4,  we  discuss  why 
algorithms  used  for  image  segmentation  cannot  be  directly  applied  to  segment  IEDs.  Chapter  5 
gives  an  overview  of  approaches  to  pre-process  an  image  to  achieve  better  segmentation  results. 
In  Chapter  6,  we  verify  that  our  approach  of  estimating  the  initial  object  pixel  results  in  a  more 
precise  segmentation.  Examples  how  image  segmentation  is  used  to  generate  new  training  im¬ 
ages  for  IED  detection  are  provided  in  Chapter  7.  Chapter  8  highlights  the  results  and  provides 
options  how  this  work  can  be  continued. 
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CHAPTER  2: 

Literature  Review 


2.1  General  Methods 

The  goal  of  image  segmentation  is  to  group  pixels  together  into  regions  of  similarity,  based 
on  various  features  extracted  from  an  image.  This  might  be  color  information  that  is  used  to 
create  color  models,  or  information  about  the  pixels  that  indicate  edges  or  boundaries  or  texture 
information  [12]. 

The  literature  provides  several  criteria  to  group  image  segmentation  algorithms.  Pham,  Xu, 
and  Prince  group  the  different  algorithms  based  on  the  grade  of  user- interactions  [13].  For  our 
purpose,  the  grade  of  interaction  is  not  of  particular  interest.  We  group  the  algorithms  similar 
to  Cufi,  Munoz,  Freixenet  and  Marti  [14]  based  on  the  input  information  they  require. 

In  general,  image  segmentation  methods  are  based  on  two  basic  properties  of  the  pixels  in 
relation  to  their  local  neighborhood:  discontinuity  and  similarity  [14].  Figure  2.1  gives  an 
overview  of  algorithms,  categorized  by  the  information  they  require  to  segment  a  colored  image 
into  object  and  background  regions. 

The  following  list  indicates  the  main  idea  of  the  algorithms  shown  in  Figure  2.1. 

•  Model-based  methods  segment  an  image  based  on  existing  geometrical  or  statistical  mod¬ 
els  of  the  object  itself.  Since  IED  types  quickly  change,  models  of  particular  types  are 
often  not  available.  Also,  to  generate  a  template  of  an  IED,  we  need  to  segment  a  similar 
IED  from  another  image.  This  approach  is  not  well  suited  in  the  context  of  IEDs  and  we 
exclude  this  type  of  algorithms  from  our  review. 

However,  we  do  use  the  idea  of  building  a  statistical  model.  Instead  of  building  a  model 
for  the  object,  we  build  statistical  models  for  the  background.  The  detailed  explanation  of 
our  statistical  model  is  shown  in  Chapter  5,  Novel  Segmentation  Algorithm  Development. 

•  Boundary -based  methods  rely  on  discontinuities  in  image  values  between  distinct  regions, 
and  the  goal  of  these  segmentation  algorithm  is  to  accurately  demarcate  the  boundary 
separating  these  regions  [15].  An  example  of  an  boundary-oriented  approach  is  Live 
wire. 
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Figure  2.1:  Algorithms  for  image  segmentation  categorized  by  the  information  they  are  including  to  segment  a 
colored  image  into  object  and  background  regions. 


•  Region-based  methods  are  based  on  the  similarity  property  of  the  pixels  belonging  to 
the  same  area.  The  goal  of  these  segmentation  algorithm  is  to  partition  regions  accord¬ 
ing  to  common  image  properties.  Examples  of  image  properties  are  intensity  values, 
textures  or  patterns.  Region-based  methods  can  be  sub-divided  into  pixel-oriented  and 
region-oriented  algorithms.  Commonly  used  examples  for  pixel-oriented  algorithms  are 
Thresholding  and  Clustering. 

•  Algorithms  combining  both  boundary-based  and  region-based  methods  aim  to  improve 
the  segmentation  result  by  including  both  the  discontinuity  and  the  similarity  property. 
Solely  boundary-based  and  region  based  methods  often  fail  to  produce  accurate  segmen¬ 
tation  results,  although  the  location  in  which  each  of  these  methods  fail  may  not  be  iden¬ 
tical  [16].  Both  approaches  suffer  from  a  lack  of  information  since  they  rely  on  ill-defined 
hard  thresholds,  which  may  lead  to  wrong  decisions.  The  goal  of  combined  approaches 
is  using  the  complementary  information  of  region-based  and  boundary-based  information 
to  reduce  the  problems  that  arise  in  each  individual  methods  [15]. 


Following  is  an  overview  and  short  explanation  of  all  algorithms  that  influence  our  work. 
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2.1.1  Live  wire 

Livewire  is  also  known  as  Intelligent  Scissors.  The  basic  idea  of  this  algorithm  is  to  formulate 
a  colored  image  as  a  directed  weighted  graph.  The  pixels  of  the  colored  image  represent  nodes 
in  the  graph,  connected  with  each  of  the  eight  adjacent  pixels.  The  costs  of  the  edges  represent 
the  discontinuity  between  their  represented  pixels.  Different  implementations  of  this  algorithm 
use  different  criteria  to  describe  the  abstract  term  discontinuity.  But,  in  general,  we  can  say  that 
similar  adjacent  pixels  have  a  low  discontinuity  and  therefore  result  in  small  costs  associated 
with  the  corresponding  edges. 

Assuming  that  the  colored  image  is  transferred  into  a  directed  graph,  we  can  find  the  path  that 
minimizes  the  costs  between  two  given  pixels  on  the  image.  This  path  is  the  shortest  path  in 
terms  of  costs.  If  a  user  provides  a  sequence  of  pixels  zo,  zi,  ..,zn- 1,  zn  that  are  close  to  the 
object’s  border,  the  algorithm  calculates  the  shortest  path  for  each  pair  of  consecutive  pixels 
(z^ i,  Zi)  for  all  i  in  [1,  n] .  Each  of  these  shortest  paths  becomes  a  line  segment,  and  the  set  of 
all  line  segments  defines  a  boundary  that  segments  the  object  from  the  background.  It  is  the 
user’s  responsibility  to  select  a  sequence  of  pixels  that  results  in  a  good  segmentation. 

There  exists  several  different  implementations  of  this  algorithm  [17],  [18],  [19],  [20],  [21].  They 
mainly  differ  in  the  way  the  user  can  add  new  points  to  the  boundary  and  the  cost  calculation 
representing  the  discontinuity  between  adjacent  pixels.  Since  we  are  not  focused  on  a  algorithms 
that  include  boundary  information  only,  we  do  not  review  the  different  input  possibilities. 

Boundary-based  algorithms  put  a  lot  of  effort  into  calculating  the  discontinuity  between  adja¬ 
cent  pixels.  These  calculations  are  of  interest  for  our  study  as  well.  All  approaches  we  review  do 
include  the  Gradient  magnitude,  the  Gradient  direction  and  zero  crossings  of  the  second  deriva¬ 
tive  into  their  cost  calculation  [17],  [22].  All  of  these  operators  are  used  for  edge  detection.  In 
our  approach,  we  include  only  the  gradient  magnitude  as  a  measurement  of  similarity  in  color 
between  adjacent  pixels.  The  detailed  calculations  are  shown  in  Chapter  5,  Novel  Segmentation 
Algorithm  Development. 

2.1.2  Thresholding 

Pixel-based  methods  are  the  simplest  segmentation  technique  and  define  the  segmentation  based 
on  individual  pixels  without  their  neighborhoods.  The  input  to  a  thresholding  operation  is  typi¬ 
cally  a  gray  scale  or  color  image.  Thresholding  is  based  on  the  assumption  that  different  regions 
in  an  image  have  distinct  intensity  or  color  distributions  as  well.  Based  on  this  assumption,  the 
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regions  of  an  image  can  be  segmented  based  on  the  mean  and  the  standard  deviation  of  each  dis¬ 
tribution.  In  simple  implementations  for  gray-scale  images,  the  segmentation  is  determined  by 
a  single  parameter  known  as  the  intensity  threshold.  Each  pixel  in  the  image  is  compared  with 
this  threshold.  Based  on  the  comparison,  each  pixel  is  individually  categorized  as  “belonging 
to  the  object”  or  “belonging  to  the  background.” 

Figure  2.2  shows  an  example  of  a  two-dimensional  gray  scale  image  and  the  corresponding 
intensity  histogram.  Assuming  the  intensity  thresholds  Tj  and  T2  are  known  and  the  intensity 
of  a  pixel  at  position  (x,y)  is  I(x,y),  the  object  can  be  segmented  from  the  background  by 
applying  following  compound  threshold  rule:  If  I(x,  y)  >  Tj  and  I(x,  y)  <  T2  than  (x,  y)  gets 
categorized  as  “belonging  to  the  object.” 


Figure  2.2:  The  first  row  shows  a  gray-scale  image  that  can  be  segmented  into  object  and  background  by  applying 
a  compound  threshold  rule.  The  second  row  shows  the  segmented  IED  and  the  associated  intensity  histogram. 

For  color  images,  different  thresholds  for  each  color  channel  or  specified  cuboid  in  RGB  space 
can  be  used  to  replace  the  single  parameter.  In  any  case,  the  key  parameter  in  the  threshold¬ 
ing  process  is  selecting  the  threshold  value.  Figure  2.3  scenario  A  shows  an  example  where 
defining  good  threshold  values  is  possible  but  difficult  since  the  IED  and  the  background  show 
similar  colors  and  therefore  the  RGB  histograms  overlap  partially.  In  the  case  of  scenario  B 
it  is  impossible  to  assign  threshold  values  in  RGB  space  due  the  completely  overlapping  RGB 
histograms. 

IEDs  are  often  colored  similar  to  the  background,  and  the  actual  IEDs  are  only  a  small  fraction 
of  the  complete  image.  Basing  a  segmentation  only  on  the  color  of  a  pixel,  and  not  including 
any  neighborhood  information,  does  not  seem  to  be  the  right  approach  in  the  context  of  image 
segmentation  for  IEDs.  However,  we  use  the  RGB  histograms  or  the  cuboid  in  RGB  space  to 
estimate  the  probability  that  the  color  of  a  specific  pixel  tends  to  belong  to  the  background  or 
to  the  object.  The  calculations  we  use  are  explained  in  detail  in  Chapter  5,  Novel  Segmentation 
Algorithm  Development. 

For  a  more  detailed  overview  and  explanation  regarding  the  different  approaches,  see  the  survey 
by  Sahoo,  Soltani  and  Wong  in  1988  [23].  Also,  Sezgin  and  Sankur  2004  and  Shapiro  2001 
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Figure  2.3:  Comparing  the  usefulness  of  histograms  to  define  the  threshold  for  separating  lEDs  from  the  back¬ 
ground.  The  image  in  scenario  A  shows  an  IED  where  thresholding  based  on  the  RGB  histograms  could  be  used  to 
separate  the  IED  and  the  background.  Scenario  B  shows  a  real-life  example  of  an  IED  where  the  color  histograms 
completely  overlap.  For  scenario  B,  thresholding  in  RGB  space  cannot  be  used  to  segment  the  object. 

categorize  the  different  techniques  based  on  the  information  the  algorithm  manipulates  [24]  and 
based  on  finding  a  threshold  [25]. 

2.1.3  Clustering 

In  general,  clustering  is  an  unsupervised  learning  task  aiming  to  partition  a  set  of  data  into 
clusters,  such  that  the  inner  class  similarity  and  the  heterogeneity  between  the  different  clusters 
is  maximized.  For  image  segmentation  we  like  to  partition  an  image  into  k  clusters,  s.t.  all 
pixels  belonging  to  the  same  cluster  are  similar,  and  pixels  belonging  to  different  clusters  are 
different.  Depending  on  the  purpose  of  the  segmentation,  similarity  can  be  defined  based  on 
distance ,  color ,  connectivity ,  intensity ,  etc. 

Clustering  algorithms  can  be  classified  as  hard  clustering  and  fuzzy  clustering.  In  the  case  of 
hard  clustering,  every  pixel  belongs  to  one  and  only  one  cluster,  where  for  fuzzy  clustering  a 
decimal  number  in  [0, 1]  indicates  the  tendency  to  belong  to  a  cluster  [26]. 

Hard  Clustering 

A  well-known  hard  clustering  algorithm  is  the  K-means  algorithm  first  introduced  by  Mac- 
Queen  in  1965  [27]  and  then  improved  by  Hartigan  and  Wong  [28].  Assuming  that  the  number 
of  resulting  clusters  is  given  as  k,  the  algorithm  works  as  follows: 

1.  Choose  k  pixels  as  initial  centroids,  one  for  each  cluster. 

2.  Compare  every  pixel  with  every  centroid  and  assign  the  pixels  to  the  most  similar  cluster. 

3.  Re-calculate  the  centroids  in  RGB  space  for  each  cluster. 
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4.  Repeat  step  2  &  3  until  the  position  of  all  centroids  stay  constant. 


As  a  result,  we  get  an  image  segmented  into  k  clusters,  and  each  cluster  having  it  is  own  cen¬ 
troid.  Taking  only  the  pixel  information  into  account,  is  not  precise  enough  to  segment  objects 
colored  similar  to  the  background.  But,  it  appears  that  clustering  is  a  memory  efficient  method 
for  representing  color  models.  For  our  work,  we  use  the  K-means  algorithm  to  generate  a  set  of 
background  models,  explained  in  detail  in  Chapter  5,  Novel  Segmentation  Algorithm  Develop¬ 
ment. 


2.2  Graph  Cut  Algorithms 

Graph  Cut  algorithms  state  one  approach  to  include  the  complementary  information  of  region- 
based  and  boundary-based  information  into  the  segmentation.  The  general  concept  of  using 
binary  graph  cut  algorithms  for  object  segmentation  was  first  proposed  and  tested  in  Boykov 
and  Jolly  2001  [8].  Since  the  segmentation  optimized  by  graph  cuts  combines  boundary  regular¬ 
ization  with  region-based  properties  [29],  this  thesis  is  mainly  focusing  on  image  segmentation 
by  graph-cut  algorithms. 

Image  Segmentation  using  Graph  Cut  algorithms  separates  the  segmentation  into  three  different 
aspects  of  the  problem: 

1 .  The  first  problem  is  including  prior  knowledge  into  the  segmentation  process  by  defining 
the  initial  regions,  background  and  object,  shown  on  a  colored  image.  This  task  becomes 
harder  as  distinguishing  between  foreground  and  background  gets  more  difficult.  There¬ 
fore,  the  particular  colored  image  to  segment  has  a  great  impact  on  the  usefulness  of  a 
selected  approach. 

2.  Once  the  initial  regions  are  defined,  an  undirected  graph  representing  the  colored  image 
and  the  defined  regions  has  to  be  generated.  All  graph  cut  algorithms  we  analyze  define 
the  undirected  network  G(V,E)  as  a  set  of  nodes  V,  edges  E,  and  capacities  C:  E  — y  M. 

3.  A  minimum  cut  algorithm  applied  on  the  generated  network  is  used  to  partition  the  net¬ 
work.  Because  of  the  way  the  network  is  constructed,  the  partitioning  of  the  network  is 
equivalent  to  the  segmentation  of  the  underlying  colored  image. 

Even  though  all  graph  cut  algorithms  address  these  three  aspects  of  the  problem,  the  actual 
implementation  of  the  different  aspects,  specially  defining  the  initial  regions,  vary  greatly  and 
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are  optimized  for  images  within  a  particular  domain.  To  provide  an  overview  about  the  differ¬ 
ent  efforts  to  provide  prior  knowledge  to  the  segmentation,  we  explain  three  different  labeling 
techniques  in  more  detail. 

2.3  Image  Segmentation  using  Graph  Cut 

We  found  several  algorithms  for  Interactive  Image  Segmentation.  But  even  the  more  advanced 
implementations  refer  to  the  approach  introduced  by  Boykov  and  Jolly  2001  [8].  Since  this 
segmentation  approach  states  the  baseline  for  all  more  advanced  algorithms,  this  algorithm  is 
described  first. 

The  Boykov  and  Jolly  algorithm  addresses  the  segmentation  of  a  monochrome  image.  This 
algorithm  divides  an  image  into  two  segments,  the  object  and  the  background.  The  user  has  to 
provide  the  following  information: 

1.  A  set  of  pixels  belonging  to  the  object. 

2.  A  set  of  pixels  belonging  to  the  background. 

3.  A  set  of  pixels  belonging  either  to  the  object  or  the  background. 


Figure  2.4  shows  how  this  is  done.  All  pixels  inside  the  red  polygon  definitely  belong  to  the 
object.  All  pixels  outside  the  yellow  polygon  definitely  belong  to  the  background.  It  is  uncertain 
to  which  segment  all  pixels  between  the  yellow  and  the  red  polygon  belong.  This  representation 
is  called  complete  labeling. 


Figure  2.4:  Monochrome  image  of  an  artillery-shell  found  in  Iraq  (image  published  by  http://www.globalsecurity.org). 
The  user  defined  the  set  of  pixels  which  belong  to  the  object  or  to  the  background. 


In  this  approach  segmentation  depends  on  the  generated  object  and  background  gray-level  his¬ 
tograms.  In  Figure  2.5,  the  first  row  shows  an  monochrome  image  and  the  related  gray-level 
histogram.  The  histogram  shows  that  the  monochrome  image  includes  three  different  regions. 
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The  second  and  third  rows  of  Figure  2.5,  show  the  image  divided  into  object  and  background 
and  the  related  histograms  after  segmentation.  The  gray-level  histogram  of  the  object  shows 
one  remaining  region.  This  observation  illustrates  the  assumption  that  for  a  good  segmentation 
the  opacity  is  coherent  and  the  object  appears  to  be  closed. 


Figure  2.5:  First  Row  shows  a  monochrome  image  of  an  artillery-shell  found  in  Iraq  (image  published  by 
http://www.globalsecurity.org)  and  the  related  histogram  of  the  gray-value  distribution.  The  second  and  the  third 
rows  show  the  histogram  for  the  object  and  the  background  after  they  were  segmented. 

Chapter  3,  Minimum  Cut  Algorithm  for  Image  Segmentation  explains  in  detail  how  the  related 
undirected  network  is  generated  and  how  a  Minimum  Cut  algorithm  is  used  to  segment  the 
object. 

2.4  Image  Segmentation  using  “GrabCut” 

This  algorithm  introduced  by  Rother,  Kolmogorov  and  Blake  2004  [7]  combines  the  following 
steps: 

1.  Obtaining  a  hard  segmentation  using  an  iterative  Graph  Cut  [8] 

2.  Border  Matting 


Obtaining  the  hard  segmentation 

The  main  difference  from  the  initial  graph  cut  algorithm  [8]  is  reduced  user  effort.  Instead  of 
defining  the  object  and  background  region  completely  as  shown  in  Figure  2.4,  an  example  of 
complete  labeling,  this  approach  requires  only  an  incomplete  labeling  as  a  first  step.  Initially 
only  the  pixels  belonging  to  the  background  are  defined  as  shown  in  Figure  2.6.  The  pixels 
belonging  to  the  object  are  obtained  automatically  and  not  defined  by  the  user.  The  segmentation 
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task  follows  the  same  principle  as  the  Graph  Cut.  If  the  segmentation  is  not  precise  enough,  the 
user  provides  additional  labels  to  improve  the  result.Because  every  new  label  provides  more 
information  the  segmentation  task  has  to  be  repeated.  The  final  result  is  derived  in  a  number 
of  iterations.  Each  of  these  iterations  consist  of  a  user  input  and  a  following  segmentation. 
Therefore,  this  approach  is  called  iterative  Graph  Cut  [7]. 


Figure  2.6:  A  colored  image  of  an  artillery-shell  found  in  Iraq  (image  published  by  http://www.globalsecurity.org). 
The  yellow  rectangle  shows  the  label  the  user  added  initially.  Similar  to  the  Graph  Cut,  every  pixel  outside  this 
rectangular  area  belongs  to  the  background.  The  red  lines  represent  additional  labels  a  user  might  have  added  in 
order  to  improve  the  result. 


Border  matting 

In  addition  to  the  segmentation  it  might  be  necessary  to  produce  a  clean  output  of  the  object. 
However,  mixed  pixels  along  the  object  boundaries,  belong  to  the  object  but  their  appearance 
changes  depending  on  the  background.  The  pixels  close  to  the  boundary  between  the  object  and 
a  bright  background  appear  to  be  different  than  the  pixels  close  to  the  boundary  between  the  ob¬ 
ject  and  a  dark  background.  The  occurrence  of  mixed  pixels  requires  further  image  processing. 

The  GrabCut  algorithm  [7]  uses  border  matting  to  deal  with  the  problem  of  matting  in  the 
presence  of  blur  and  mixed  pixels  along  smooth  object  boundaries.  A  simplified  explanation 
of  border  matting  is  to  color  the  mixed  pixels  similarly  to  the  close  object  pixels.  After  adding 
transparency  to  these  mixed  pixels,  such  that  the  transparency  increases  as  the  distance  to  the 
border  increases,  the  boundary  appears  to  be  smooth.  For  a  more  detailed  explanation  of  border 
matting,  see  the  “GrabCut”  by  Rother,  Kolmogorov  and  Blake  in  2004  [7]. 
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2.5  Image  Segmentation  using  an  Adaptive  Gaussian  Mix¬ 
ture  Markov  Random  Field  (GMMRF)  model 

This  approach  introduced  by  authors  Blake,  Rother  and  Brown  2007  addresses  the  hard  seg¬ 
mentation  problem  when  the  object  and  background  color  distributions  overlap  at  least  in  parts. 

One  big  difference  from  the  two  approaches  explained  above  is  the  labeling.  Instead  of  roughly 
estimating  the  object  and  the  background  separately  as  done  for  the  basic  graph  cut  algorithm, 
the  regions  are  defined  in  one  step  by  using  a  fat  pen  trail  enclosing  the  object  boundary.  Figure 
2.7  shows  a  fat  pen  trail  enclosing  the  object  boundary.  Similar  to  the  GrabCut  algorithm,  this 
approach  allows  adding  more  labels  iteratively  to  improve  the  segmentation. 


Figure  2.7:  A  fat  pen  trail  encloses  the  object  boundary.  Every  pixel  outside  the  trail  belongs  to  the  background. 
Every  pixel  inside  the  trail  belongs  to  the  object.  For  all  pixels  covered  by  the  trail,  it  is  not  clear  to  which  segment 
they  belong. 

The  regions  of  the  object  and  the  background  define  training  regions  to  build  a  probabilistic 
formulation  of  the  model  in  terms  of  a  Gaussian  Mixture  Markov  Random  Field  (GMMRF). 
Secondly,  a  pseudo  likelihood  algorithm  is  derived  to  learn  the  colour  mixture  and  coherence 
parameters  for  foreground  and  background.  Li  1995  [30]  and  Perez  1998  [31]  explain  the 
different  types  of  Markov  Random  Fields  and  their  relation  to  (colored)  images. 

The  segmentation  task  is  to  label  all  pixels  in  the  “unclassified”  region  as  belonging  either  to  the 
object  or  to  the  background.  Based  on  the  following  incomplete  list  of  criteria  and  information, 
the  algorithm  decides  if  a  pixel  belongs  to  the  object: 


•  Matching  the  labels  of  adjoining  labeled  pixels 

•  Models  for  color  and  texture  properties  of  the  object  and  background  pixels.  The  models 
are  learned  from  the  related  training  regions. 
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Similar  to  the  “GrabCut”  [7],  the  user  can  add  missing  parts  of  the  object  by  adding  additional 
object  labels. 

A  similar  approach  regarding  the  usage  of  Adaptive  Gaussian  Mixture  Markov  Random  Fields 
is  introduced  by  Kato  and  Pong  2006  [32]. 
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CHAPTER  3: 

Minimum  Cut  Algorithm  for  Image 
Segmentation 


3.1  Detailed  Example  of  the  Initial  Graph  Cut  Algorithm  of 
Boykov  and  Jolly  2001 

Graph  Cut  [8]  of  a  monochrome  image  is  a  state-of-the-art  algorithm  for  interactive  image 
segmentation  and  the  baseline  for  many  more  advanced  algorithms.  Since  all  algorithms  we 
found  require  user-defined  object  labels,  either  as  initial  labels  or  iteratively  added  labels,  we 
decided  to  start  using  the  basic  algorithm  of  Boykov  and  Jolly  and  improve  it  to  match  our 
requirements. 

3.1.1  Input 

The  algorithm  requires  two  different  types  of  user  input: 

•  The  monochrome  image  showing  the  object  and  the  background. 

This  image  is  also  expressed  by  a  1-dimensional  array  of  gray  values  z  =  (zi, ..,  zn, ..,  zN), 
where  n  is  the  number  of  pixels  in  a  row,  m  is  the  number  of  rows  and  N  is  the  total  num¬ 
ber  of  pixels,  N  —  n*  m. 

•  The  initial  estimation  of  object  and  background  regions. 

The  regions  are  user  provided,  as  shown  in  Figure  3.1  image  A.  The  separation  of  the 
image  in  the  three  regions  -  object,  background  and  unknown  -  is  called  a  trimap  T  = 
{T0,  Tb,  Tv},  where  T0  is  the  set  of  pixels  labeled  as  object  by  the  user,  TB  is  the  set  of 
background  pixels,  and  Tjj  is  the  unknown  region. 


3.1.2  Optimization  Variable 

The  segmentation  of  the  image  is  expressed  as  a  vector  of  unknown  variables  a  —  (au,  ..,an, ..,  a at). 
The  values  of  a*  show  whether  the  related  pixel  tends  to  be  a  member  of  the  foreground  (object) 
or  the  background.  Usually  0  <  ct;  <  1,  Vi  €  N. 
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Figure  3.1 :  Image  A  shows  an  example  of  typical  object  and  background  labels  used  to  define  the  training  regions 
for  the  Boykov  and  Jolly  algorithm  of  2001 .  The  Images  B  and  C  show  the  directly  assembled  gray-value  histograms 
for  the  background  (B)  and  the  object  (C) 


A  segmentation  is  a  hard  segmentation  if  ay  G  {0, 1}.  In  this  case  ay  =  1  if  the  pixel  zt  belongs 
to  the  object  and  ay  =  0  if  the  pixel  /~,  belongs  to  the  background.  We  are  focused  on  a  hard 
segmentation. 

3.1.3  Segmentation  by  Minimum  Cut  Algorithm 

A  pair  of  histograms  is  directly  assembled  from  the  user-defined  training  areas.  The  histogram 
in  Figure  3. 1  image  B,  shows  the  distribution  of  gray- values  for  all  pixels  belonging  to  the  back¬ 
ground.  The  second  histogram  shows  the  distribution  of  gray-values  for  all  pixels  belonging  to 
the  object.  In  this  document,  we  call  the  histogram  associated  with  the  background  h(z ,  0) 
and  the  histogram  associated  with  the  object  h(z,  1).  Equation  (3.1)  shows  the  mathematical 
expression  we  use  for  the  pair  of  histograms  6. 

6  =  {h(z,  a),  a  G  (0, 1}}  (3.1) 


An  objective  function  E,  shown  in  Equation  (3.2),  is  defined  so  that  its  minimum  corresponds 
to  a  good  segmentation.  The  literature  calls  this  objective  function  Gibbs  Energy  or  just  Energy 
function. 

For  all  Graph  Cut  algorithms  we  study,  this  objective  function  is  the  sum  of  the  two  functions 
U  and  V,  where  the  function  U  is  a  region  property  which  evaluates  the  consistency  of  the  his¬ 
tograms  of  object  and  background,  and  the  function  V  is  a  smoothness  term J  boundary  property 
which  penalizes  the  difference  between  neighboring  pixels. 

For  example,  U(ot ,  9,  z )  can  be  thought  of  a  sum  of  functions,  one  for  each  pixel  describing 
how  the  gray-value  of  pixel  zt  fits  into  the  assembled  histogram  of  the  object  h(z,  1).  Similarly, 
V (ay  z)  can  be  thought  of  as  a  sum  of  functions  on  neighboring  pixels  with  V (zi}  ay,  Zj,  ay) 
measuring  the  similarity  between  adjacent  pixels  z%  and  zr  The  function  V ( Z{ ,  at,  z3.  ay)  results 
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in  large  values  when  similar  pixels  are  separated  and  small  values  otherwise.  If  a  pair  of  adjacent 
pixels  is  not  separated,  meaning  their  associated  a  values  are  equal,  V (, z* ,  ai:  zv  ay)  is  zero. 

V  is  defined  on  a  neighborhood  of  pixels.  Depending  on  the  neighborhood,  the  euclidean  dis¬ 
tance  between  adjacent  pixels  might  be  different,  therefore  V  is  a  weighted  sum  using  the  r/.J.  /  } 
as  weights. 


E(a,6,z)  =  U(a,  9,  z)  +  V(a,  z) 


(3.2) 


A  good  segmentation  results  in  a  small  value  for  U  because  the  gray-values  of  all  pixels  labeled 
as  “belonging  to  the  object”  are  consistent  with  the  histogram  h(z ,  1)  and  all  pixels  labeled  as 
“belonging  to  the  background”  are  consistent  with  the  histogram  h(z,  0).  A  good  segmentation 
also  shows  small  penalty  values  for  V.  Figure  3.2  shows  example  values  for  both  terms  for  one 
particular  pixel  z*. 


Region  Property  depending  on  the 
probability  that  a  gray-value  fits  a  histogram 


Probability  that  a  gray-value  x  fits  a  histogram 


Boundary  Property  depending  on  the  differend 
gray  values  between  neighboring  pixels 


> 

£ 


Q_ 


Figure  3.2:  The  plot  on  the  left  side  shows  the  region  property  U  depended  on  how  the  gray-value  of  pixel  %  fits 
into  an  assembled  histogram.  The  plot  on  the  right  side  shows  the  dependency  between  the  boundary  property  an 
the  difference  in  gray-value  between  adjacent  pixels. 


Since  U  and  V  are  both  penalty  functions,  a  good  segmentation  can  be  obtained  by  finding 
the  values  7?  that  minimize  the  objective  function  E  as  shown  in  Equation  (3.2).  Meaning  the 
smallest  amount  of  Energy  required  to  draw  a  border  that  segments  the  image  into  object  and 
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background.  Mathematically,  we  need  to  solve  Equation  (3.3)  to  find  the  segmentation  for  a 
given  image. 


a  =  arg  min  E (a,  9 ,  z ) 

a 


(3.3) 


3.1.4  A  Segmentation  Objective  Function 

We  fully  specify  the  objective  function  (3.3)  using  (3.4). 


En 

-log  (h(zn,an)) 

n=  1 


(3.4) 


V(a,z)  =  7 


( m,n)eC 


dis(m,  n 


[an^am\e  p{Zm  Zn) 2 


The  expressions  of  (3.4),  contain  two  unexplained  constants  7  and  (5.  The  constant  7  is  used  to 
weight  the  smoothness  term  U  compared  to  the  region  property  V. 

Boykov  and  Jolly  2001  [8]  select  (3  dependent  on  the  expected  difference  in  gray-value  among 
adjacent  pixels  over  an  image  sample  as  shown  in  Equation  (3.5).  Based  on  the  results  of 
Boykov  and  Jolly  2001,  this  choice  of  (3  ensures  that  the  exponential  term  in  V  switches  ap¬ 
propriately  between  high  and  low  contrast.  Other  algorithms  define  (3  to  be  zero,  to  encourage 
smoothness  everywhere  on  the  image.  If  (3  is  set  to  be  zero,  the  resulting  boundary  property  is 
called  Ising  prior. 

0  =  2£2[;,„  -  *„]  <3'5) 

3.1.5  Using  the  Minimum  Cut  Algorithm  to  Find  a  Hard  Segmentation 

Following,  the  segmentation  algorithm  defines  an  undirected  network  G(V,E)  consisting  of 
nodes  V,  edges  E,  and  capacities  C:  E  — )■  M. 


•  Nodes  V:  All  pixels  Zj  of  the  monochrome  image  become  a  node.  In  addition,  two  arti¬ 
ficial  nodes  S  and  T  are  added.  S  represents  the  source  or  supply  node,  and  T  represents 
the  sink  or  demand  node.  S  and  T  are  also  called  terminals. 
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•  Edges  E:  The  set  of  edges  E  consists  of  two  types  of  undirected  edges:  neighborhood 
links  and  terminal  links,  or  n-links  and  t-links  for  short.  Each  pixel  Zi  has  two  t-links 
(S,Zj)  and  (z,,T)  connecting  it  to  each  terminal.  In  addition,  the  n-link  (zt,Zj)  exists  if  the 
pixel  Zj  is  in  the  neighborhood  of  the  pixel  zt. 

For  image  segmentation,  the  neighborhood  is  usually  defined  as  a  horizontal,  vertical  and 
diagonal  neighborhood.  This  means  that  each  pixel  can  have  up  to  8  neighbors. 

•  Capacities  C:  The  capacities  for  edges  between  pixel-nodes  correspond  to  the  smoothness 
term  in  Equation  (3.4).  The  different  algorithms  use  different  formulas  for  creating  these 
terms.  Table  3.1  gives  the  capacities  of  the  edges  similar  to  Boykov  and  Funka-Lea 
2006  [29].  Different  to  Equations  (3.5)  and  (3.4),  Boykov  and  Funka-Lea  2006  use  the 
parameter  a  to  refer  to  the  expected  difference  in  gray-value  among  adjacent  pixels  over 
the  image. 


Edge 

Capacity 

for 

(Zh  Zj) 

1  Hii-ij? 

dis(i,j ) C 

n-links 

(S,  Zi) 

inf. 

0 

G(h(z,  1  ),Zi) 

Zi  belongs  to  object,  cr*  =  1 

Zi  belongs  to  background, cr*  =  0 
Zi  belongs  to  unknown  region 

C Zi,T ) 

0 

inf. 

G(h(z,0),Zi) 

Zi  belongs  to  object,  at  —  1 

Zi  belongs  to  background, cr*  =  0 
Zi  belongs  to  unknown  region 

Table  3.1 :  Table  gives  the  capacity  of  edges  similar  to  the  capacities  given  by  Boykov  and  Funka-Lea  (2006). 


The  segmentation  of  the  image  into  object  and  background  is  achieved  by  finding  the  minimum 
s-t  cut  of  the  associated  network  G(V,E).  An  example  of  a  completely  defined  network,  the 
minimum  s-t  cut  and  the  resulting  segmentation  is  shown  in  Figure  3.3. 

3.2  Modifying  the  Initial  Graph  Cut  Algorithm  to  Segment 
Colored  Images 

The  initial  Graph  Cut  Algorithm,  as  explained  in  the  first  part  of  this  chapter,  addresses  the  seg¬ 
mentation  of  monochrome  images.  This  section  provides  a  rough  overview  about  the  necessary 
modifications  on  the  initial  algorithm  to  segment  colored  images  as  well. 
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B 

a)  Monochrome  Image 


d)  Segmentation 

It 


Figure  3.3:  A  segmentation  example  for  a  3x3  monochrome  image,  a)  Shows  the  three  regions  initially  defined  by 
the  user.  The  pixel  at  position  (3,3)  belongs  to  the  object,  the  pixel  at  position  (1,1)  belongs  to  the  background,  and 
all  other  pixels  belong  to  the  unknown  region,  b)  Shows  the  network  representation  of  the  image.  The  capacity  of 
each  edge  is  reflected  by  the  edge’s  thickness,  c)  Shows  the  minimum  cut  drawn  as  a  red  line,  d)  Shows  the  result 
of  the  segmentation  task. 


3.2.1  Defining  the  Initial  Regions 

For  the  first  version  of  our  algorithm,  we  follow  the  approach  of  Boykov  and  Jolly  (2001)  and 
ask  the  user  to  define  both  training  regions,  one  for  the  object  and  one  for  the  background,  as 
shown  in  the  two  examples  in  Figure  3.4. 

The  two  scenarios  in  Figure  3.4  show  how  the  user-defined  labels  influence  the  segmentation 
algorithm  and  could  change  the  results  completely.  This  observation  and  potential  solutions  for 
this  issue  are  discussed  in  detail  in  Chapter  4. 


3.2.2  Generating  the  Object  and  Background  Model 

None  of  the  Graph  Cut  algorithms  we  review  uses  the  RGB  color  model  since  it  is  not  sufficient 
for  comparing  color  distributions  [7],  [9],  [33].  Without  questioning  this  statement,  we  first 
implement  the  naive  approach  and  replace  the  initially  used  gray  value  histograms  for  the  object 
and  background  by  two  sets  of  RGB  histograms.  Each  set  consists  of  3  histograms,  one  for  each 
color  component  in  the  RGB  color  space. 

Figure  3.5  gives  an  example  of  a  colored  image,  the  user-provided  object  and  background  la¬ 
bel  and  the  resulting  estimated  color  model  for  the  object  and  the  background.  Note  that  the 
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Figure  3.4:  Two  examples  of  object  and  background  labels  for  the  same  colored  image.  In  scenario  A,  the  training 
area  of  the  object  includes  the  black  bag  and  therefore  segments  the  actual  IED  and  the  bag  from  the  background. 
Wherein  scenario  B,  the  bag  is  included  to  the  training  area  of  the  background  and  therefore  only  the  IED  is  seg¬ 
mented  from  the  background. 

set  of  histograms  also  includes  an  overlay  histogram.  The  overlay  histogram  shows  a  colored 
graphical  distribution  of  all  RGB  colors  where  the  tonal  range  for  the  RGB  colors  overlap.  Both 
overlay  histograms  in  Figure  3.5  show  similar  distributions  among  the  related  RGB  histograms. 
Therefore  the  same  segmentation  result  can  be  achieved  by  considering  only  one  color  com¬ 
ponent  instead  of  all  three.  In  the  given  example  the  histograms  of  the  red  component  alone 
add  enough  information  to  segment  the  shown  IED  from  the  background.  However,  examples 
of  real-world  IEDs  show  that  we  do  not  have  enough  information  to  clearly  decide  if  a  pixel 
belongs  to  the  IED  or  to  the  background. 

3.2.3  Generating  the  Network  Representing  the  Segmentation  Problem 

Besides  the  capacities  of  the  edges,  the  undirected  network  G(V,E)  is  built  the  same  way  as  for 
the  segmentation  of  a  monochrome  image. 

Different  from  the  segmentation  of  a  monochrome  image,  we  include  the  information  given 
by  the  three  different  distributions  of  the  RGB  color  components.  Table  3.2  gives  the  actual 
implemented  capacities  we  use. 

3.2.4  Finding  the  Segmentation  -  Nearly  Orthogonal  Latin  Hypercube 

The  capacities  of  the  t-links  given  by  Table  3.2  use  the  parameter  A  to  weight  the  region  and  the 
boundary  property.  From  a  functional  point  of  view,  this  parameter  is  similar  to  the  parameter 
7  in  the  first  section  of  this  chapter. 

The  function  we  use  to  calculate  the  capacities  of  the  n-links  is  similar  to  the  function  used 
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Figure  3.5:  Image  shows  two  sets  of  RGB  color  histograms.  Each  set  consists  of  3  histograms,  one  for  each  color 
component  in  the  RGB  color  space.  The  fourth  histogram  is  an  overlay  of  the  three  separate  RGB  color  components 
and  shows  a  colored  graphical  distribution  of  all  colors  where  the  tonal  range  for  the  RGB  colors  overlap. 


by  Boykov  and  Jolly  (2001).  This  function  penalizes  discontinuities  between  pixels  of  similar 
color  when  the  average  color  difference  is  smaller  than  a.  The  penalty  gets  smaller  as  the 
average  color  difference  increases.  Because  of  this  behavior  Boykov  and  Jolly  (2001)  refer  to 
this  parameter  a  as  camera  noise. 

Technically,  we  have  a  third  “hidden”  parameter  indicating  the  number  of  decimals  we  use  to 
define  the  edge  capacities.  There  are  some  estimates  available  regarding  the  “best”  values  for  A 
and  a,  but  we  do  not  know  what  number  of  decimals  are  sufficient.  We  run  a  series  of  test  to 
explore  the  impact  of  these  three  parameters  on  the  segmentation  result. 

To  limit  the  number  of  tests  to  perform  and  to  still  have  a  space  filling  design  of  experiment, 
we  use  the  Nearly  Orthogonal  Latin  Hypercube  [34],  [35]  design  of  experiment.  An  orthogonal 
design  enables  regression  analyzes  where  it  is  not  possible  to  actually  perform  a  test  for  all 
possible  combinations  of  all  parameter  values.  We  used  an  Excel  spreadsheets  published  by 
MacCalman’s  2012  to  generate  the  design  points  for  our  series  of  tests. 

The  tests  performed  in  this  chapter  have  two  purposes: 


•  To  show  that  our  implementation  of  Boykov  and  Jolly’s  algorithm  can  be  used  to  segment 
simple  IEDs  from  the  background. 
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Edge 

Capacity 

Implementation 

{zii  zj) 

V(zi,zj) 

( 

d 

'  =  (  E  (Zi,c  ~  zJ,c)2  ) 

\c€r,g,b  ) 

(S,Zi) 

< 

inf  if  Zi  belongs  to  object 

0  if  Zi  belongs  to  backg. 

F(h(z,  a  =  1),  zf)  otherwise 

< 

8  max{V  (2^,  zf}  +  1 

0 

— AlogP{z,:  G  backg.  /i(z,  0)} 

(zh  T) 

< 

0  if  Zi  belongs  to  object 

inf  if  Zi  belongs  to  backg. 

F(h(z,a  =  0  ),zf)  otherwise 

< 

'0 

8  max{V  (zt,  zf}  +  1 
— AlogPjz,:  G  object \h(z,  1)} 

Table  3.2:  Table  gives  the  first  version  of  capacities  we  use  to  segment  a  colored  image.  The  capacities  are  similar 
to  the  capacities  shown  in  Table  3.1  but  include  the  information  given  by  the  RGB  color  components. 

•  To  define  the  domain  for  the  parameters  for  the  future  tests. 

For  our  first  test  series,  we  define  the  domain  of  the  parameters  as  following:  A  G  [0, 1],  a  G 
[20, 100]  and  the  number  of  decimals  in  [0, 4].  We  also  manually  define  a  “perfect”  segmentation 
to  compare  the  segmentation  result  for  each  design  point.  As  measurement  of  goodness,  we 
estimate  the  total  number  of  errors  by  counting  the  misclassified  pixels. 

Figure  3.6  left  plot  shows  that  the  number  of  errors  varies  a  lot  if  the  number  of  decimals  is  zero 
or  one,  but  does  not  seem  to  get  influenced  as  long  as  the  number  of  decimals  is  at  least  two. 
Based  on  this  observation,  we  keep  the  number  of  decimals  constant  equals  to  four  for  all  future 
test  series. 

Figure  3.6  middle  plot  shows  that  there  is  a  direct  relation  between  the  number  of  errors  and 
A  indicating  the  weight  of  the  region  property.  We  get  the  highest  number  of  errors,  if  A  is 
equals  to  one,  meaning  that  only  the  region  property  is  taken  into  account.  This  underlines  our 
previous  assumption  that  a  segmentation  using  only  the  region  property  is  not  sufficient  for  our 
purposes.  For  all  future  test,  we  limit  the  domain  of  A  to  [0,  0.5]. 

This  first  test  series  does  not  provide  any  information  that  allows  to  limit  the  range  of  the 
parameter  a,  as  shown  in  Figure  3.6  right  plot.  Therefore,  we  keep  to  domain  as  [20, 100]  for 
the  future  test  series. 
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Figure  3.6:  Results  of  first  test  series:  Plots  show  the  total  number  of  errors  dependent  on  the  number  of  decimals 
(left),  the  values  for  parameter  A  (middle)  and  the  values  for  parameter  a  (right). 


Figure  3.7  shows  two  examples  of  simple  IEDs  used  for  the  first  test  series,  their  training  regions 
and  the  achieved  segmentation  results. 


Figure  3.7:  Two  example  of  IEDs  segmented  from  their  background.  The  parameter  values  for  both  segmentations 
are  a  =  0.1  and  a  =  100. 
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CHAPTER  4: 

Failure  of  Basic  Algorithm 


4.1  Example  of  a  Real-World  IED 

Unfortunately,  the  implemented  algorithm  as  shown  in  Chapter  3  cannot  be  used  to  segment  a 
more  realistic  IED  from  the  background.  The  image  shown  in  Figure  4. 1  left  side,  shows  an 
IED  that  is  colored  similar  to  the  background  and  also  partially  covered  by  material  from  the 
background.  In  order  to  generate  a  complete  model  for  the  object,  it  is  necessary  to  include  parts 
of  the  artillery  shell  and  the  cell  phone  into  the  training  area  for  the  object,  as  shown  in  Figure 
4.1  middle  image.  The  image  in  Figure  4.1  right  side  shows  that  using  the  same  parameter 
settings  as  of  Chapter  4  is  not  sufficient  for  good  segmentation  results. 


Figure  4.1 :  Example  of  a  real-world  IED  that  is  more  difficult  to  segment  from  the  background. 


4.2  Weaknesses  of  the  Basic  Algorithm 

We  identify  two  major  issues  that  are  due  the  different  type  of  objects  we  like  to  segment  from 
the  background.  Figure  4.2  shows  a  complete  colored  image  that  needs  to  be  segmented. 

Different  from  other  segmentation  tasks,  real-world  IEDs  are  only  a  small  fraction  of  the  com¬ 
plete  image.  Meaning  that  the  training  area  for  the  background  model  is  extremely  large  com¬ 
pare  to  the  training  area  of  the  IED.  Not  surprisingly,  as  bigger  the  training  area  of  the  back¬ 
ground  gets,  as  more  different  color  colors  codes  are  included  to  the  model  of  the  background. 
And  therefore,  while  building  the  network  representing  the  segmentation  problem  we  overesti¬ 
mate  the  capacities  for  the  edges  connecting  the  pixel  nodes  and  the  T-node. 

Another  weakness  is  that  IEDs  do  not  necessary  appear  as  a  single  object.  Often,  charges  and 
initiation  systems  appear  -if  visible-  as  different  objects.  Figure  4.2  shows  an  IED  that  consists 
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of  two  different  parts.  These  different  parts  cannot  be  included  into  one  single  training  area 
without  adding  background  as  well.  In  other  words,  from  the  beginning  we  include  background 
pixels  to  the  final  result  since  we  labeled  them  as  “belonging  to  the  object.”  Wrong  initial  labels 
cannot  result  in  a  good  segmentation. 

Another  issue  is  due  to  a  wrong  assumption  related  to  the  color  models  we  build.  Replacing  the 
single  histogram  of  gray-values  by  three  independent  histograms,  one  for  each  color  component 
assumes  independence  between  the  color  components.  But,  we  know  that  the  RGB  color  com¬ 
ponents  are  not  independent.  Therefore,  we  can  say  that  the  color  model  we  use  for  Chapter  4 
is  not  precise  enough,  even  if  it  is  sufficient  for  simple  IEDs. 


The  development  of  different  potential  solutions  in  order  to  solve  these  three  issues  is  shown  in 
Chapter  5,  Novel  Segmentation  Algorithm  Development. 


Figure  4.2:  Example  of  a  complete  colored  image  we  need  to  segment.  The  shown  IED  consists  of  two  parts  at 
different  locations.  Therefore  the  IED  does  not  appear  as  one  single  object. 
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CHAPTER  5: 

Novel  Segmentation  Algorithm 
Development 


5.1  Defining  the  Training  Area  for  the  Background 

This  section  explains  approaches  we  use  to  define  the  training  area  for  the  background.  The 
biggest  difference  from  images  used  for  the  algorithms  listed  in  Chapter  2  is  the  small  fraction 
of  pixels  belonging  to  the  object.  An  easy  approach  to  handle  this  issue  is  to  manually  define 
a  subimage  that  only  shows  the  IED  and  the  close  surroundings.  This  workaround  avoids  the 
problem  instead  of  solving  it.  Therefore,  we  aim  for  an  approach  that  takes  the  colored  image 
as  it  is  and  builds  a  model  considering  the  complete  background  area. 

5.1.1  Single  Training  Area  for  the  Background 

Figure  5.1  gives  a  typical  example  of  a  colored  image  we  need  to  segment.  The  image  consists 
of  3456*2592  pixels  where  less  than  1%  do  belong  to  the  actual  IED.  As  a  result  of  this  imbal¬ 
ance  between  IED  and  background  area,  the  RGB  color  histograms  for  the  background  include 
almost  the  complete  color  space  of  the  actual  image. 

Based  on  our  assumption,  the  training  area  for  the  IED  cannot  be  provided  manually.  Therefore, 
we  like  to  estimate  the  IED  area  by  identifying  colors  unlikely  to  appear  in  the  user-provided 
background.  But,  since  the  background  includes  almost  the  complete  color  space  of  the  image, 
no  color  codes  are  unlikely. 

5.1.2  Multiple  Training  Areas  for  the  Background 

The  main  idea  for  this  approach  is  based  on  the  our  answer  for  the  question:  What  makes  a  color 
unusual?  Considering  the  image  shown  in  Figure  5.1,  the  color  of  the  actual  IED  is  not  unusual. 
The  same  color  is  observed  frequently  in  the  rocky  area  of  the  image.  But,  in  the  background 
the  colors  of  the  IED  are  never  observed  on  the  street.  Or  generally  speaking,  a  color  is  unusual 
if  the  color  is  unlikely  to  appear  together  with  its  surrounding. 

We  cut  the  background  into  smaller  rectangles  of  similar  size,  as  shown  in  Figure  5.2,  left 
image.  As  a  result,  we  get  several  areas  showing  colors  that  appear  next  to  each  other  in  the 
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Figure  5.1 :  Example  of  a  colored  Image  we  need  to  segment  and  the  resulting  RGB  histograms  after  using  a  single 
background  area  as  defined  by  excluding  the  red  circled  areas. 


background.  This  approach  allows  to  define  many  color  models  of  the  background,  without 
including  all  observed  colors  into  a  single  color  model. 

It  is  theoretically  possible  to  divide  the  background  exactly  between  two  different  colors.  In 
that  case,  none  of  the  background  areas  would  show  the  two  different  colors  together,  and  the 
appearance  of  both  colors  would  be  classified  as  unusual.  In  all  tests  we  do,  this  problem  never 
appears.  Nevertheless,  Figure  5.2,  right  image  shows  artificially  added  overlapping  areas  to 
address  this  issue.  The  overlapping  areas  have  similar  size  as  the  original  rectangles  and  are 
centered  at  each  comer  of  an  original  rectangle. 


Figure  5.2:  Examples  of  a  user-provided  training  area  of  the  background  and  the  associated  break  down  into  smaller 
rectangles  of  the  same  size.  In  addition,  the  image  on  the  right  shows  also  the  overlapping  areas  between  adjacent 
rectangles. 
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5.2  Color  Model  Representing  the  Training  Area  of  the  Back¬ 
ground 

To  evaluate  the  region  property,  we  need  to  define  a  color  model  that  describes  the  background. 
This  measures  how  well  a  pixel’s  color  fits  into  the  background.  This  section  explains  the 
different  color  models  we  use. 

5.2.1  Independent  RGB  Color  Histograms 

Boykov  and  Jolly’s  initial  approach  addresses  the  segmentation  of  a  monochrome  image,  that 
can  be  fully  described  by  a  1 -dimensional  vector  of  gray  scale  values.  Different  from  gray 
scale  values,  RGB  colors  are  3 -dimensional,  and  therefore  cannot  be  expressed  as  a  single  1- 
dimensional  vector.  Let  RGB*  be  the  set  of  (r,  g.  b )  tuples  appearing  in  the  background,  than 
RGB*  is  the  color  space  of  this  background  and  a  subset  of  RGB ,  a  set  that  includes  all  possible 
tuples.  Let  r,  be  the  number  of  all  tuples  in  RGB*  where  the  red  color  component  is  equals  to  i, 
and  let  i  be  in  [0,  255].  We  can  think  of  ry  as  it  is  defined  in  Equations  (5.1),  as  the  projection  of 
RGB*  on  the  red  color  component.  Similarly,  RGB*  can  be  projected  on  the  color  components 
green  and  blue. 


n  =  \{(ri,g,b)  G  RGB*}\,\/i  e  [0,255]  (5.1) 

9i  =  \{(r,gi,b)  e  RGB*}\,\/i  e  [0,255] 

bi  =  \{(r,g,bi )  G  RGB*}\,\/i  e  [0,255] 

These  projections  can  be  visualized  as  three  color  histograms,  one  for  each  of  the  color  com¬ 
ponents,  as  shown  in  Figure  5.1.  After  projection,  the  information  regarding  the  occurrence 
of  color  codes  is  lost.  To  illustrate  that  estimating  the  region  property  based  on  projections  is 
insufficient,  we  generate  the  three  different  images  shown  in  Figure  5.3  left  side.  The  images 
show  different  colors,  and  therefore  have  a  different  color  space  associated  with  them.  But,  the 
histograms  of  their  projections  are  similar  to  the  set  of  histograms  shown  in  Figure  5.3. 

Imagine,  we  use  these  as  a  color  model  to  estimate  if  the  RGB  colors:  black  -  (0,0,0),  red  - 
(255,0,0)  and  violet  -  (255,255,0)  fit  into  the  background.  Then,  the  histograms  let  us  believe 
that  all  colors  black ,  red  and  violet  are  likely  to  appear  in  the  background,  which  is  wrong. 
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This  example  shows  that  projections  of  a  color  space  are  not  sufficient  to  estimate  the  probability 
that  a  particular  color  code  appears  in  the  background,  and  can  lead  to  wrong  segmentation 
results. 


Figure  5.3:  The  images  on  the  left  shows  three  different  backgrounds.  All  three  backgrounds  result  in  the  same 
independent  RGB  color  histograms,  shown  on  the  right  side. 


5.2.2  RGB  Color  Cube 

Since  the  projection  of  the  RGB  color  space  on  the  different  color  components  masquerades  the 
information  about  the  occurrence  of  color  combinations,  we  use  a  3-dimensional  color  model 
as  symbolically  illustrated  in  Figure  5.4.  This  Color  Cube  records  the  number  of  times  each 
color  code  appears  in  the  colored  image.  In  this  document,  we  use  counter(r,  g ,  b )  to  refer  to 
the  number  of  times  a  particular  color  code  (r,g,b)  occurs,  and  n &  to  refer  to  the  total  number  of 
pixels  in  the  background. 


RGB  green 


Figure  5.4:  The  image  shows  a  simplified  RGB  color  cube  having  a  dimension  of  5  *  5  *  5.  For  our  application,  we 
use  the  complete  color  cube  matching  the  complete  RGB  color  space  of  256*256*256  different  colors. 
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The  color  cube  allows  an  estimate  of  the  probability  that  a  particular  color  appears  in  the  back¬ 
ground  by  estimating  the  conditional  probability  that  an  observed  pixel  has  this  color,  given 
that  the  pixel  belongs  to  the  background.  The  conditional  probability  can  be  estimated  with  a 
non-parametric  estimator  P(r,g,b)  —  ,  This  estimator  has  two  major  weaknesses. 

The  color  cube  contains  about  16.8  million  different  colors,  but  the  human  eye  is  not  able  to 
distinguish  all  these  colors.  Therefore,  a  color  that  seems  to  occur  frequently  in  an  image,  might 
be  spread  out  among  different  color  codes.  Meaning,  that  the  sum  of  these  color  codes  is  big, 
but  each  color  codes  does  not  occur  often.  For  an  example,  imagine  the  complete  image  shown 
in  Figure  5.1  is  labeled  as  background.  In  this  image,  the  maximum  occurrence  of  a  color 
max  (counter  (r,  g,b))  =  328  while  rib  =  708,  588  and  therefore  the  estimated  probability  that 
the  color  that  appears  most  in  the  background  is  observed  is  only  about  4.6  *  1CT4. 

The  second  weakness  is  that  the  color  cube  produces  gaps  between  observed  color  combina¬ 
tions.  For  example  counter(r,  g,  b )  equals  0,  while  counter(r  —  1,  g ,  b )  and  counter(r  +  1,  g,  b) 
are  positive.  In  that  case,  the  estimator  for  color  code  (r,g,b)  evaluates  to  zero,  but  this  is  intu¬ 
itively  not  correct.  Figure  5.5,  left  side,  shows  the  projections  of  one  color  cube  on  the  three 
different  color  components.  The  differences  in  height  between  neighboring  values  of  one  color 
component  are  results  of  the  gaps  between  neighboring  pixels  in  the  color  cube. 

5.2.3  RGB  Color  Cube  Including  Smoothing 

We  assume  camera  noise  makes  RGB  colors  be  normally  distributed,  and  use  Gaussian  Curves 
to  fill  possible  caps  in  a  generated  color  cube.  In  this  document,  we  use  the  term  smooth¬ 
ing  to  describe  this  process  of  filling  the  gaps.  The  main  idea  is  to  first  define  3-dimensional 
Gaussians,  each  one  centered  at  a  color  code  that  appeared  in  the  background,  and  scaled  by 
the  number  of  times  that  particular  color  code  appeared.  The  result  of  this  first  step,  are  sev¬ 
eral  3-dimensional  Gaussians  that  might  overlap  in  parts.  In  the  second  step,  we  build  the 
3-dimensional,  cumulative  distribution  of  all  these  Gaussian  Curves. 

Technically,  we  define  one  additional  parameter  a  that  defines  the  standard  deviation  of  all 
Gaussian  Curves.  Each  Gaussian  effects  all  color  codes  within  a  maximum  euclidean  distance 
of  3a,  how  strong  they  are  affected  is  evaluated  by  the  height  of  the  curve  at  this  distance. 
Figure  5.5  shows  the  effect  of  smoothing  using  3-dimensional  Gaussian  Curves  with  a  a  equals 
three. 

The  smoothing  process  is  computationally  complex.  Iterating  through  the  color  cube  itself  has 
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a  complexity  of  n3,  where  n  is  the  number  of  levels  for  one  color  component.  And  for  each 
of  these  steps,  we  need  to  iterate  through  all  effected  neighboring  color  codes  that  has  another 
cubic  complexity  of  (2(3cx))3.  Therefore,  the  overall  complexity  of  smoothing  is  6 3(n<r)3.  For 
our  color  cube  n  is  equals  256,  and  the  a  is  a  user-defined  parameter.  For  example,  let  a  be 
equals  to  three,  then  smoothing  requires  97, 844109  steps. 


Distribution:  Red  Distribution:  Red 
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Figure  5.5:  The  image  visualizes  the  effect  of  smoothing  on  the  assembled  RGB  color  distribution.  For  better 
visualization,  these  histograms  show  the  projection  of  the  same  color  cube  on  the  red,  green  and  blue  dimension. 
Therefor,  these  histograms  are  not  independent. 

Smoothing  solves  the  problem  of  gaps  between  neighboring  color  codes  in  the  color  cube,  but 
it  is  computationally  complex  and  requires  a  new  parameter,  the  standard  deviation  defining  the 
Gaussian  Curves.  Our  main  criticism  of  this  approach  is  the  user-defined  standard  deviation. 
Even  if  we  get  good  segmentation  results,  the  user  may  not  know  how  to  define  this  parameter. 

5.2.4  Statistical  Model  Using  Reference  Points 

The  color  cube  is  a  sufficient  model  to  describe  the  color  distribution  of  an  associated  part  of  a 
colored  image.  But,  the  color  cube  has  a  dimension  of  2563,  and  each  element  in  the  color  cube 
is  in  fact  a  variable  to  count  the  number  of  times  a  particular  color  code  appears.  To  provide 
an  example  of  the  required  working  memory:  In  Java  the  variable  type  that  requires  the  least 
memory  and  is  sufficient  in  this  context  is  short  integer  that  requires  2  bytes.  Therefore,  a  single 
color  cube  requires  up  to  2563  *  2  bytes,  which  is  equivalent  to  32,768  MByte. 

In  the  previous  section  of  this  Chapter,  we  explain  the  advantage  of  cutting  a  single  background 
area  into  smaller  parts.  In  our  experiments  and  depending  the  type  of  image,  the  number  of 
background  squares  can  be  up  to  200,  or  in  terms  of  working  memory  6,25  GByte. 


38 


Our  Statistical  Model  is  motivated  by  following  key  requirements: 


•  Using  the  same  dimension  as  the  RGB  color  space  in  order  to  not  loose  any  information 
due  to  projections. 

•  Replacing  the  computational  intensive  smoothing  without  getting  rid  of  the  gaps-filling 
effect. 

•  Representing  a  color  cube  in  a  compressed  way. 

We  use  clustering  to  group  all  color  codes  present  in  the  color  cube.  Different  to  the  clustering 
explained  in  Chapter  2,  where  clustering  is  used  to  segment  the  image,  we  use  clustering  to  find 
a  compressed  representation  of  the  color  cube.  After  applying  the  k-means  Clustering  algorithm 
for  up  to  10  clusters,  a  color  cube  is  represented  by  the  resulting  clusters.  Each  cluster  is  defined 
by  a  centroid  in  RGB  color  space,  the  estimated  mean  color  distance  to  a  belonging  pixel’s 
color,  the  estimated  standard  deviation  of  this  distance,  and  the  number  of  pixels  belonging 
to  the  cluster.  Figure  5.6  shows  an  example  of  the  set  of  centroids  after  applying  k-means 
clustering.  The  complete  statistics,  as  shown  in  Table  5.1  replaces  one  single  color  cube.  Using 
this  statistical  representation  instead  of  the  original  color  cube,  reduces  the  required  working 
memory  from  initially  32,768  MByte  to  approximately  260  Bytes. 


Distribution:  Red  Distribution:  Red 


Figure  5.6:  The  set  of  histograms  on  the  left  shows  the  actual  distribution  of  the  RGB  color  components.  The 
histograms  on  the  right  side  show  the  centroids  of  the  clusters  after  applying  a  K  mean  algorithm  for  k  <  10.  The 
reference  point  belong  to  the  statistics  shown  in  Table  5.1. 

The  information  shown  in  Table  5.1  defines  a  3-dimensional  Gaussian  Mixture  Model.  The 
projection  of  the  mixture  model  defined  by  Table  5.1,  on  the  red  color  component  is  shown  in 
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Centroid  in  RGB  color  space 
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Table  5.1 :  Table  provides  an  example  statistics  for  the  clusters  after  applying  a  K  mean  algorithm  for  k  <  10.  The 
centroids  are  also  shown  in  Figure  5.6. 


Figure  5.7,  first  row.  The  image  shown  in  the  second  row  of  Figure  5.7  shows  the  cumulative 
curve  that  includes  all  Gaussian  Curves  defined  by  the  statistics  in  Table  5.1. 
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Figure  5.7:  The  figure  in  the  first  row  shows  the  projection  of  the  overlapping  Gaussian  Curves  defined  by  the 
centroids  and  the  standard  deviations  as  shown  in  Table  5.1.  The  figure  in  the  second  row  shows  the  cumulative 
curve  including  all  Gaussian  Curves  of  the  figure  in  the  first  row. 


Assuming  that  a  cluster  C*  is  defined  by  a  centroid  (ry,  g,.  bi),  the  estimated  mean  color  distance 
jli  and  the  estimated  standard  deviation  of  this  distance  sd, .  We  use  Equation  (5.2)  to  calculate 
the  probability  that  a  color  (rr,  gi,b\)  belongs  to  this  cluster. 
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P{{ri,9i,bi)  e  Q} 


2$ 


(5.2) 


^  \/(r  i  ) 2 + (gi  -g» ) 2 + (foi  -  biyt-fii  ^ 


^{(n,3iA)  e  B}  < 


2  max 

i 


^  \/(n-T-i)2  +  (gi-gi)2+(6i-6i)2'-Ai^  ^ 


(5.3) 


For  our  purpose,  we  are  interested  in  the  probability  that  an  observed  color  fits  into  a  background 
B.  But,  a  single  background  is  represented  by  up  to  10  different  clusters.  We  compute  an  upper 
bound  of  the  probability  to  observe  a  color  (rq,  fji,  b\  )  that  belongs  to  one  of  these  clusters  using 
Equation  (5.3). 

5.3  Defining  the  Training  Area  for  the  IED 

Different  to  the  objects  shown  in  Figure  1.1,  IEDs  can  be  concealed  or  camouflaged.  In  general, 
because  of  the  nature  of  IEDs,  the  training  area  of  IEDs  cannot  be  defined  by  a  single  area, 
such  that  all  colors  present  in  the  IED  are  included  in  the  training  region  without  including 
background  colors  as  well.  This  section  explains  techniques  we  use  to  define  the  training  area 
for  the  IED. 

5.3.1  Single  Object  Label 

Figure  5.8  illustrates  the  problem  if  we  use  a  single  object  label  to  define  the  training  area  of 
an  IED.  In  this  example,  it  is  impossible  to  define  one  single  area  that  includes  all  colors  of  the 
actual  IED  without  including  background  colors  as  well.  The  direct  result  of  this  insufficient 
object  label  is  a  wrong  color  model,  similar  to  the  set  of  histograms  shown  in  Figure  5.8.  Using 
this  color  model  results  in  a  segmentation  that  tends  to  include  background  at  the  boundaries 
of  the  actual  IED.  It  is  possible  to  reduce  this  effect  by  using  only  small  weights  for  the  region 
property  for  the  segmentation,  but  this  has  more  of  a  trial- and-error  approach  than  an  algorithm. 

Using  a  single  object  label  results  in  a  segmentation  similar  to  the  example  shown  in  Figure  5.8 
at  the  top  right.  A  new  training  image  generated  from  this  segmentation  does  not  look  right, 
because  it  includes  background  from  the  original  image,  as  demonstrated  in  Figure  5.8  bottom 
right. 

5.3.2  Multiple  Object  Labels 

Figure  5.9  gives  an  example  where  we  define  multiple  training  areas,  each  one  includes  some 
colors  present  in  the  IED.  Using  multiple  training  regions,  allows  to  include  all  colors  present  in 
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Figure  5.8:  The  image  gives  an  example  of  a  single  object  label  and  the  associated  set  of  RGB  color  histograms 
generated  based  on  this  user-defined  training  area.  The  image  on  the  top  right  shows  one  segmentation  result  that 
illustrates  the  effect  background  color  cause,  if  included  into  the  color  model  of  the  IED.  The  image  at  the  bottom 
right  is  the  result  of  combining  the  segmented  IED  with  a  different  background. 


the  IED  without  including  areas  that  belong  to  the  background.  The  color  model  for  the  IED  is 
generated  based  on  all  training  areas.  The  associated  set  of  color  histograms  as  shown  in  Figure 
5.9  includes  less  background  colors  than  the  histograms  shown  in  Figure  5.8.  And  therefore, 
the  segmented  IED  includes  less  background  than  the  segmented  IED  in  the  example  of  Figure 
5.8.  Since  the  background  is  completely  excluded  from  the  segmentation,  the  segmented  IED 
can  be  combined  with  a  different  background,  to  generate  a  new  training  image  as  shown  at  the 
bottom  right  of  Figure  5.9. 


Distribution:  Red 
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Figure  5.9:  The  image  shows  an  example  for  using  different  object  labels  in  order  to  include  significant  colors  codes 
while  excluding  the  color  codes  of  the  background  covering  the  IED.  Compare  to  the  histograms  shown  at  Figure 
5.8,  the  joint  object  label  include  less  background  colors.  The  image  on  the  top  right  shows  one  segmentation  result. 
The  image  at  the  bottom  right  is  the  result  of  combining  the  segmented  IED  with  a  different  background. 


42 


5.3.3  Estimated  Initial  Object  Label 

Multiple  Object  Labels  work  well,  but  in  the  context  of  IEDs  this  approach  requires  a  precision 
that  is  for  some  IEDs  close  to  segmenting  the  IED  manually.  Motivated  by  the  idea  of  incom¬ 
plete  labeling,  as  it  is  used  in  Rother,  Kolmogorov  and  Blake  2004  [7],  we  use  the  estimated 
distance  to  the  background  to  estimate  the  initial  object  label  automatically. 

Assuming  that  a  background  B  is  represented  by  a  set  of  clusters  indexed  by  i  and  each  cluster 
is  defined  by  a  centroid  (ry,  g, .  bj)  and  the  number  of  belonging  pixels  vj,  .  Then,  the  weighted 
distance  d  of  a  pixel  with  color  ( r,g,b )  to  the  background  B  can  be  expressed  as  shown  in 
Equation  (5.4). 


d(r,g,  b)  —  T,di(r,9,t>)wi  (5.4) 

i  %  i 

di(r,g,  b)  —  y/(r  -  n)2  +  (g  -  gt )2  +  (b  -  b{ )2 


The  maximum  distance  in  the  color  cube  we  use  is  442.  Therefore,  we  build  a  one-dimensional 
vector  with  length  442  that  stores  a  color-gradient  from  blue  to  red.  To  build  a  heat  map  showing 
the  distance  of  each  pixel  color  to  the  background  B,  we  replace  the  actual  pixel  color  by  the 
gradient-color  associated  with  the  distance  to  B.  Figure  5.10  left  side  shows  an  example  of  an 
IED  we  need  to  segment  from  the  background.  The  image  in  the  middle  shows  the  heat  map 
of  the  same  image,  where  the  actual  colors  are  replaced  with  the  color  corresponding  to  the 
distance  to  the  background  B.  The  heat  map  visualizes  the  tendency  that  pixel  belongs  to  the 
background.  As  more  red  a  pixel  is  represented  in  the  heat  map,  as  more  unlikely  it  is  that  the 
pixel  belongs  to  the  background.  But,  to  generate  a  training  area  for  the  IED,  tendencies  are  not 
enough.  We  need  to  define  a  criterion  that  decides,  whether  or  not  to  include  the  pixel  into  the 
training  area. 

There  are  different  possible  solutions.  We  count  the  number  of  adjacent  pixels  that  are  closer  to 
red  than  to  blue.  If  this  number  is  greater  than  half  of  the  adjacent  pixels,  we  include  the  pixel 
into  the  training  area.  Applying  this  criterion  on  the  heat  map  results  in  an  initial  object  label, 
as  shown  in  Figure  5.10  right  side. 

5.4  Color  Model  Representing  the  Training  Area  of  the  IED 

We  use  the  same  color  models  as  we  use  for  the  background  area.  But,  none  of  our  examples  of 
IEDs  shows  more  than  three  colors  that  appear  to  be  different  for  the  human  eye.  Therefore,  we 
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Figure  5.10:  This  set  of  images  shows  an  example  of  an  IED  partially  covered  with  material  from  the  background. 
The  image  in  the  middle  shows  the  estimated  color  distance  to  the  background  visualized  by  color  radiant  from  blue 
to  red.  The  area  defined  by  the  red  pixels  on  the  right  image  define  the  initial  object  label  and  provide  the  training 
area  for  the  IED. 


limit  the  number  of  cluster  for  statistical  models  representing  the  actual  IED  to  three. 

Similar  to  Equations  (5.2)  and  (5.3),  we  estimate  the  upper  bound  of  the  probability  that  a  color 
(r i,  gi,  bi)  belongs  to  the  IED  O  as  shown  in  Equation  (5.5). 

P{(r1,g1,b1)  G  O}}  <  2  max  ^<f>  ^\/^-rd2+(gi-gd2+(f'i-fed2-A^  (5.5) 
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CHAPTER  6: 

Experimental  Evaluation  of  Novel 

Algorithms 


6.1  A  Complete  Segmentation  Algorithm 

Imagine,  we  need  to  generate  new  Training  Images  showing  the  same  IED  type  as  in  the  colored 
image  of  Figure  6.1.  To  achieve  this  goal,  we  first  have  to  segment  the  IED  from  the  background. 
The  segmentation  has  to  be  as  precise  as  possible.  But,  we  could  tolerate  missing  minor  parts 
of  the  IED.  Whereas  background  included  into  the  segmentation  limits  the  further  usage  of  the 
segmented  IED. 


Figure  6.1 :  The  image  shows  the  colored  image  used  for  the  detailed  segmentation  example  in  Chapter  6. 

6.1.1  Background  Models 

We  first  need  to  label  the  background  training  area  as  shown  in  Figure  6.2.  The  user  defines 
the  background  area  by  drawing  a  rectangle,  and  the  algorithm  automatically  divides  this  into 
smaller  areas  of  similar  size.  The  algorithm  also  adds  overlapping  areas,  to  ensure  that  all  colors 
that  appear  close  to  each  other,  are  covered  by  at  least  one  area. 
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For  each  of  these  smaller  areas,  we  generate  a  color  cube  that  represents  the  complete  color  dis¬ 
tribution.  A  k-means  Clustering  algorithm  compresses  the  information  into  a  statistical  model. 
Let  the  background  areas  be  indexed  by  n,  and  each  background  area  bn  be  statistically  de¬ 
scribed  by  up  to  ten  clusters  indexed  by  i.  The  cluster  cn,t  refers  to  the  ith  cluster  of  the  nth 
background  area.  Table  6.1  shows  the  set  of  clusters  associated  with  the  highlighted  area  in 
the  background  shown  in  Figure  6.2.  The  complete  model  B  describing  the  user-defined  back¬ 
ground  can  be  expressed  as  shown  in  Equation  (6.1).  The  background  model  B  might  be  stored 
for  further  segmentations  showing  a  similar  scenery. 


B  =  {bn  |  all  small  areas  n}  (6.1) 

bn  =  {cn,i  |  all  clusters  i  in  area  n} 


Figure  6.2:  The  image  shows  an  example  of  a  user-defined  training  area  for  the  background.  The  big  rectangle 
shows  the  user-defined  area,  whereas  all  smaller  areas  are  automatically  defined  by  the  algorithm. 


Centroid  in  RGB  color  space 

distance 

SD  distance 

Number  of  belonging  Pixel 

(158,154,144) 

16.65 

12.25 

121 

(194,195,181) 

5.65 

2.56 

5247 

(204,205,193) 

7.30 

6.47 

4613 

(182,182,169) 

7.83 

4.01 

1137 

Table  6.1 :  Table  shows  the  set  of  clusters  associated  with  the  highlighted  area  in  the  background  shown  in  Figure 

6.2. 

6.1.2  Initial  Object  Labels 

Figure  6.3  shows  an  example  of  the  heat  map  of  an  unclassified  region.  Size  and  position  of 
this  unclassified  region  can  be  changed  by  the  user.  Every  time  the  unclassified  area  changes, 
the  algorithm  calculates  a  statistical  model  with  three  clusters  for  the  unclassified  region. 


46 


The  algorithms  selects  the  background  area  that  matches  the  unclassified  area  best.  We  call  the 
background  area  that  matches  the  unclassified  model  best,  the  referenced  background  area.  As¬ 
suming,  that  the  centroid  of  the  ith  cluster  of  the  nth  background  area  is  given  by  {rn^  <j1ul .  bri  J  ) 
and  the  centroid  of  the  jth  cluster  in  the  unclassified  area  is  given  by  (r?,  g:j.  bf),  then  the  refer¬ 
enced  background  area  br  is  selected  by  minimizing  Equation  (6.2). 


br  =  arg  min 


in  ££(7  (rn,i  ~  rf2  +  (gnii  -  gj)2  +  (bn,i  -  b^WiWj 


(6.2) 


Once  the  referenced  background  area  is  computed,  the  heat  map  is  generated  by  replacing 
the  original  color  of  each  pixel  in  the  unclassified  area  with  the  color  that  corresponds  to  the 
weighted  distance  to  the  referenced  background  area  as  shown  in  Equation  (5.4).  Figure  6.3 
shows  a  heat  map,  where  blue  represents  zero  distance  and  red  represents  the  maximum  dis¬ 
tance  in  the  color  cube. 


Figure  6.3:  The  image  shows  the  heat  map  of  an  unclassified  area.  The  heat  map  is  generated  based  on  the 
distance  to  the  referenced  background  area  highlighted  in  red.  This  background  area  is  the  best  matching  area  for 
the  current  unclassified  area. 


A  basic  noise  reduction,  as  explained  in  Chapter  5.3.3,  performed  on  the  generated  heat  map, 
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results  in  the  initial  object  label  as  shown  in  Figure  6.4,  right  side.  This  estimated  object  label 
defines  the  training  area  for  the  IED.  Meaning,  that  a  new  color  model  is  built  that  includes  all 
colors  that  correspond  to  this  object  label.  As  explained  in  Chapter  5,  the  statistical  model  of 
these  colors  is  similar  to  the  background  models,  but  the  number  of  clusters  is  limited  to  four. 


Figure  6.4:  The  image  on  the  left  shows  the  IED.  The  second  image  shows  the  same  part  of  the  image,  but  as 
heat  map  generated  based  on  the  referenced  background  area  and  the  weighted  distance  of  each  pixel’s  color  to 
this  background  area.  The  image  on  the  right,  shows  the  estimated  object  label  used  to  perform  the  segmentation 
algorithm. 

6.1.3  Constructing  the  Network 

The  subimage  represented  by  the  heat  map  as  shown  in  Figure  6.3  is  transformed  into  an  undi¬ 
rected  network  G(V,E)  as  explained  in  Chapter  3.4.2.  Table  6.2  gives  the  implemented  capacities 
corresponding  to  Equations  (5.3)  and  (5.5)  used  for  the  final  version  of  our  algorithm. 


For  the  first  test  series,  we  vary  a  and  A  to  explore  the  effect  they  have  on  the  segmentation 
result.  We  compare  the  results  of  this  first  test  series  with  a  second  test  series  where  A  varies 
and  cr  is  fixed  and  equals  the  expected  color  difference  among  adjacent  pixels  over  the  image. 


Edge 

Capacity 

Implementation 

V(zi,zs) 

d 

il  A)  exp 

=  (  E  (*i,c  -  Zi,c)2  ) 

\c£r,g,b  J 

(S,Zi) 

1 

"  inf  if  Zi  G  0 

0  if  Zi  belongs  to  backg. 

F(h(z,  a  —  1),  Zi)  otherwise 

\ 

8  max{V (zi,  Zj )}  +  1 

0 

k  —A  log  P{(ri,  gi,  bf)  e  B},  E(5.3) 

(zi,T) 

1 

0  if  Zi  e  0 

inf  if  Zi  belongs  to  backg. 

^ F(h(z,a  =  0),Zj)  otherwise 

\ 

0 

8ma x{V(zi,Zj)}  +  1 

k  — A  log  P{(r.j,  bf)  e  O},  E(5.5) 

Table  6.2:  Table  gives  the  final  version  of  capacities  we  use  to  segment  an  IED.  The  capacities  are  based  on  the 
equations  shown  in  Chapter  5. 
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6.2  Results  of  the  Segmentation  Task 

Similar  to  the  initial  parameter  setting  in  Chapter  4,  we  define  the  domain  of  the  parameters 
as:  A  G  [0, 1],  a  G  [20, 100]  and  the  number  of  decimals  equals  to  four.  Due  to  the  increased 
difficulty,  it  is  impossible  to  define  a  “perfect”  segmentation  manually.  Since  we  do  not  have  a 
perfect  segmentation  we  cannot  use  the  number  to  mislabeled  pixels  as  measurement  of  good¬ 
ness.  But,  based  on  the  assumption  that  a  good  segmentation  results  in  closed  areas,  the  number 
of  isolated  pixels  can  be  used  instead.  The  number  of  isolated  pixels  alone  is  not  sufficient. 
For  example,  a  segmentation  that  results  in  zero  pixels  labeled  as  IED  is  not  a  good  segmenta¬ 
tion,  but  the  number  of  isolated  pixels  is  zero.  Therefore,  we  also  evaluate  the  goodness  of  a 
segmentation  visually  and  by  combining  the  segmented  IED  with  a  new  background. 

We  use  the  number  of  isolated  pixels  as  a  measurement  of  goodness.  Since  the  term  “isolated” 
is  not  well  defined,  for  each  pixel  that  belongs  to  the  IED,  we  count  the  number  of  adjacent 
pixels  belonging  to  the  IED  as  well.  Figure  6.5  visualizes  the  effect  of  removing  the  object 
pixels  with  less  than  n  numbers  of  adjacent  pixels  belonging  to  an  IED. 


-- 

■  -  r+ 

r-* 

A 

B 

C 

D 

Figure  6.5:  The  series  of  images  show  the  effect  that  noise  reduction  has  on  the  segmentation  result.  Image  A 
shows  a  segmentation  result  without  any  noise  reduction.  In  image  B,  all  object  pixels  with  none  adjacent  object 
pixel  are  removed  from  the  segmentation.  In  image  B,  C  and  D,  all  object  pixels  with  less  or  equals  to  one,  two  and 
three  adjacent  object  pixel  are  removed. 

Examples  similar  to  the  one  in  Figure  6.5  show  that  the  quality  of  the  segmentation  can  always 
be  increased  by  removing  single  object  pixels  and  object  pixels  with  only  one  other  object  pixel 
in  their  neighborhood.  But,  object  pixels  with  more  than  one  adjacent  object  pixels  should  not 
be  removed. 

Table  6.3  shows  a  part  of  a  distribution  for  the  number  of  adjacent  object  pixels.  The  relations 
between  A,  a  and  the  number  of  adjacent  object  pixels  are  shown  in  Figure  6.6.  Similar  to  the 
test  series  in  Chapter  4,  we  can  observe  that  as  A  gets  higher,  the  number  of  isolated  object 
pixels  increases.  The  effect  of  cr  is  opposite,  as  a  increases,  greater  differences  in  color  code 
between  adjacent  pixels  are  accepted,  which  results  in  fewer  isolated  pixels. 
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A 

cr 

|  {zijzi  E  IED ,rii=  0}  | 

|  {Zi\Zi  G  IED,  77,;=  1}  | 

|  {ZijZi  E  IED,  77;=  2}  | 

0.88 

88.57 

875 

297 

75 

0.25 

57.14 

304 

220 

52 

0.67 

51.42 

732 

314 

79 

0.89 

83.60 

908 

303 

83 

0.53 

65.31 

385 

231 

62 

0.10 

68.57 

212 

178 

46 

0.33 

74.28 

296 

202 

45 

0.13 

100.00 

164 

148 

32 

0.22 

20.00 

710 

374 

93 

0.82 

33.88 

973 

382 

99 

0.64 

92.51 

378 

215 

55 

0.60 

24.11 

912 

388 

101 

0.39 

56.85 

341 

235 

61 

1.00 

40.00 

1074 

329 

83 

Table  6.3:  Table  gives  the  distribution  for  the  number  of  object  pixels  in  the  neighborhood  of  each  object  pixel,  for  a 
segmentation  that  depends  on  the  parameters  A  and  a. 


Figure  6.6:  The  plots  show  the  general  relation  between  the  number  of  isolated  pixels  and  the  parameters  A  and  a. 


The  effect  of  A  and  a  are  opposite,  and  the  effect  of  a  itself  is  dependent  on  A.  Therefore,  we 
fix  cr,  similar  to  Boykov  and  Jolly  2001  [8],  to  the  expected  color  difference  among  all  adjacent 
pixels  over  an  image  sample  as  shown  in  Equation  (6.3).  For  the  image  shown  in  Figure  6.4,  the 
expected  color  difference  equals  46.65.  The  corresponding  test  series  and  associated  results  are 
shown  in  Table  6.4.  The  final  test  series  limits  the  domain  of  A  to  [0,  0.5].  As  Figure  6.7  middle 
plot,  shows,  the  range  of  isolated  pixel  is  smaller  if  we  fix  the  parameter  (3  equals  to  46.65,  and 
decreases  again,  if  we  limit  the  range  for  A.  Therefore,  all  further  test  series  depend  only  on  the 
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parameter  A,  and  the  range  is  limited  to  [0,  0.5]. 


a  =  E[zm  -  zn] 


(6.3) 


A 

cr 

{zi\zi  G  IED,  7ii=  0} 

{zi\zi  G  IED,  rii=  1} 

0.88 

46.65 

724 

337 

0.25 

46.65 

193 

155 

0.67 

46.65 

438 

27 

0.89 

46.65 

751 

350 

0.53 

46.65 

268 

181 

0.10 

46.65 

140 

124 

0.33 

46.65 

208 

156 

0.13 

46.65 

158 

134 

0.22 

46.65 

184 

147 

0.82 

46.65 

672 

313 

0.64 

46.65 

403 

252 

0.60 

46.65 

350 

222 

0.39 

46.65 

215 

156 

1.00 

46.65 

871 

355 

Table  6.4:  Table  gives  the  distribution  for  the  number  of  object  pixels  in  the  neighborhood  of  each  object  pixel,  for  a 
segmentation  that  only  depends  on  the  parameters  A. 


Figure  6.7:  The  plots  show  the  distribution  of  the  number  of  isolated  pixels.  The  plot  on  the  left  shows  the  number 
of  isolated  pixels  for  A  in  [0, 1]  and  a  in  [20, 100].  The  plot  in  the  middle  shows  the  number  of  isolated  pixels  for  A  in 
[0, 1]  and  a  equals  to  46.65.  The  plot  in  the  middle  shows  the  number  of  isolated  pixels  for  A  in  [0, 0.5]  and  a  equals 
to  46.65. 
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CHAPTER  7: 

Possible  Applications  Using  Segmented 

Images 


The  results  of  this  thesis  have  three  different  applications:  defining  the  IED  on  an  image,  gener¬ 
ating  new  training  images,  helping  humans  detect  IEDs.  Following  are  some  examples  of  these 
applications. 


7.1  Defining  the  Area  on  an  IED 

To  use  a  colored  image  for  automated  IED  detection  training,  the  area  of  the  IED  must  be  de¬ 
fined,  in  order  to  distinguish  between  a  student’s  detection  and  misdetection.  In  some  examples, 
the  estimated  object  labels  are  close  to  the  final  segmentation  result.  In  these  cases  the  estimated 
object  labels  can  be  used  to  define  the  IED  as  shown  in  Figure  7.1.  But,  generally  the  estimated 
object  label  does  not  cover  the  complete  IED  as  shown  in  Figure  7.2,  and  the  area  of  the  IED 
has  to  be  defined  based  on  the  segmentation  result. 


Figure  7.1:  The  image  shows  an  unusual  example  where  the  estimated  object  label  covers  almost  the  complete 
IED  and  therefore  defines  the  IED  region  for  automated  detection  training  sufficiently.  In  this  case,  performing  the 
segmentation  to  define  the  IED  area  is  not  required. 


*  V* 

Figure  7.2:  The  image  shows  a  usual  example  where  the  estimated  object  label  does  not  cover  the  complete  IED. 
In  this  case,  performing  the  segmentation  is  necessary  to  define  the  IED  area. 
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7.2  Generating  new  Training  Images 

To  generate  new  training  images,  the  IED  has  to  be  segmented  as  precisely  as  possible.  In 
case  of  wrongly  labeled  pixels,  we  can  tolerate  object  pixels  that  are  labeled  as  “belonging  to 
the  background.”  But,  background  pixels  that  are  labeled  as  “belonging  to  the  IED”  limit  the 
further  usage  of  the  segmented  IED.  Figure  7.3  shows  examples  of  a  segmented  IED  combined 
with  different  backgrounds  to  generate  new  training  images. 


Figure  7.3:  The  image  shows  a  segmented  IED  combined  with  different  backgrounds  to  generate  new  training 
images. 


Figure  7.4  shows  the  main  problem  with  the  current  approach.  The  color  of  some  IED  parts  fit 
well  into  at  least  one  cluster  of  the  background  model.  As  a  result  these  colors  are  not  part  of 
the  estimated  object  label.  As  a  result,  object  pixels  with  colors  that  fit  the  background  model 
tend  to  be  labeled  as  “belonging  to  the  background.”  Figure  7.4  shows  a  segmentation  scenario, 
where  it  would  have  been  correct  to  include  background  colors  into  the  color  model  of  the  IED. 
Chapter  8,  Conclusion  and  Future  Work  discusses  possible  solutions  to  this  problem. 


Figure  7.4:  The  image  shows  a  segmented  IED  with  a  high  rate  of  object  pixels  wrongly  labeled  as  “belonging  to 
the  background.” 


7.3  Help  detecting  IEDs 

Overlays  replace  each  pixel  color  by  a  color  code  indicating  the  distance  to  the  background. 
This  functionality  is  not  based  on  any  assumption  and  can  be  computed  at  runtime.  For  all 
colored  images  showing  an  IED,  the  overlay  highlighted  either  the  IED  or  the  unusual  area 
around  the  IED.  Therefore,  overlays  as  shown  in  Figure  7.5  may  be  helpful  as  a  hybrid  between 
a  computer  pre-processing  the  image,  and  a  human  operator  providing  the  final  identification. 
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Figure  7.5  shows  overlays  for  different  parts  of  the  image  shown  in  Figure  5.1.  Each  row  in 
Figure  7.5  shows  the  part  of  the  original  image  we  are  interested  in,  the  referenced  background 
area  and  the  corresponding  overlay. 


Figure  7.5:  The  image  shows  different  parts  of  the  colored  image  shown  in  Figure  5.1 .  Each  row  shows  the  area  of 
interest,  the  referenced  background  and  the  generated  overlay.  Different  from  the  first  three  images,  the  last  image 
shows  an  IED.  The  visible  parts  of  this  IED  are  highlighted. 


Our  statistical  background  model  can  also  be  applied  to  images  different  than  the  image  used  to 
generate  the  model.  Figure  7.6  and  Figure  7.7  show  examples  of  overlays  based  on  a  statistical 
model  that  is  generated  from  a  different  image.  Different  to  the  IEDs  shown  in  Figure  7.6,  the 
IEDs  in  Figure  7.7  are  more  difficult  to  detect  for  humans.  But,  in  both  scenarios,  the  overlay 
hides  the  increased  difficulty  by  highlights  the  IEDs.  A  possible  application  of  statistical  back¬ 
ground  models  applied  to  different  images  is  explained  in  Chapter  8,  Conclusion  and  Future 
Work. 
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Figure  7.6:  The  image  shows  highlighted  lEDs  based  on  a  statistical  background  model  generated  from  a  different 
image. 


Figure  7.7:  The  image  shows  highlighted  lEDs  based  on  a  statistical  background  model  generated  from  a  different 
image.  Different  to  the  lEDs  shown  in  Figure  7.6,  detecting  the  lEDs  on  the  original  image  is  more  difficult. 
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CHAPTER  8: 

Conclusion  and  Future  Work 


We  address  the  increased  difficulty  while  segmenting  IEDs  from  a  colored  image  with  three  key 
contributions: 

•  Our  algorithm  automatically  divides  a  user-defined  background  area  into  smaller  areas. 
We  generate  separate  color  models  for  each  of  these  area,  to  ensures  that  a  color  model 
includes  only  colors  that  appear  in  the  same  area  of  the  background. 

•  We  compress  each  of  these  complex  color  models  into  a  statistical  model,  by  applying 
a  K-means  Clustering  algorithm.  This  increases  the  number  of  background  models  we 
can  hold  simultaneously  in  working  memory,  and  allows  to  generate  a  set  of  background 
models  that  describes  a  complete  environment. 

•  We  estimate  the  initial  object  labels  based  on  the  color  distance  to  the  background.  This 
approach  enables  generating  color  models  for  IEDs  even  if  they  are  partially  hidden  or 
covered  with  background.  It  also  allows  IED  segmentation  without  an  user-provided 
object  label. 

Weak  segmentation  results  as  shown  in  Figure  7.4  are  a  result  of  an  assumption  made  to  estimate 
the  initial  object  label.  We  assume  that  colors  that  are  unusual  for  the  background  are  part  of  the 
IED  and  include  them  into  the  initial  object  label.  But,  this  also  means  that  we  do  not  include 
colors  into  the  initial  object  label  that  are  likely  to  appear  in  the  background.  This  assumption  is 
necessary  for  an  unsupervised  estimation  of  the  initial  object  label.  But  as  shown  in  Figure  7.4, 
this  assumption  is  not  generally  true.  The  image  shows,  that  a  part  of  the  IED,  the  display  of 
the  cell  phone,  is  colored  similar  to  the  background,  and  therefore  not  included  into  the  initial 
object  label.  This  issue  can  be  solved,  by  allowing  the  user  to  iteratively  add  missing  parts  and 
include  them  into  the  initial  object  label. 

Currently,  the  region  property  is  based  on  color  information  only.  But,  the  dimension  of  the 
underlying  model  can  be  increased  by  adding  more  information  like  infrared  or  temperature . 
Using  temperature  based  heat-maps  is  a  common  technique  to  detect  IEDs  that  are  completely 
hidden.  Adding  this  information  provides  an  additional  criteria  to  classify  a  pixel  as  belonging 
to  the  IED  or  the  background. 
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The  final  version  of  our  algorithm  is  optimized  for  IEDs  that  are  partially  hidden,  concealed  and 
have  a  small  area  compare  to  the  background.  This  optimized  version  results  in  weak  segmen¬ 
tations  if  applied  to  IEDs  that  are  easier  to  segment.  Chapter  5  describes  different  approaches 
to  label  the  initial  object  area.  All  of  these  approaches  are  helpful  in  some  scenarios,  and  none 
of  the  approaches  is  generally  better  than  the  others.  Therefore,  combining  all  approaches  to 
label  the  initial  object  area  into  a  single  application  would  generate  a  toolbox  that  supports  the 
segmentation  of  many  IEDs. 

The  statistical  models  we  use  to  highlight  unusual  areas  on  an  image,  can  be  generated  based 
on  a  different  image.  It  is  also  possible  to  combine  statistical  models  from  more  than  one 
colored  images.  This  could  be  helpful  to  describe  the  complete  environment  of  an  area  and 
use  an  overlay  that  highlights  all  region  on  an  image  that  do  not  fit  into  this  environment. 
Before  spending  more  time  on  overlay  functionality,  it  should  be  tested,  if  the  overlays  have  a 
statistically  significant  impact  on  the  detection  rate. 
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APPENDIX  A: 

Used  Software 


Following  applications  are  used  while  working  on  this  thesis: 

•  Java  SDK  1.6  is  used  to  implement  all  versions  of  our  algorithm.  We  use  an  implementa¬ 
tion  of  Ford  Folkerson’s  Minimum  Cut  algorithm  that  is  developed  by  Robert  Sedgewick 
and  Kevin  Wayne.  We  use  a  Java  library,  provided  by  MySQL,  to  connect  our  application 
to  a  MySQL  database.  Besides  these  external  libraries,  all  required  development  were 
carried  out  by  us. 

•  MySQL  5.2.3  is  used  to  store  the  parameters  for  our  test  series  and  the  associated  results. 

•  R  and  the  functionalities  provided  by  the  lattice  package  is  used  to  visualize  the  relations 
between  parameters  and  the  measurements  of  goodness. 

•  An  Excel  worksheet  provided  by  Alex  D.  MacCalman  is  used  to  generate  the  data  points 
for  a  Second  Order  Nearly  Orthogonal  Design  of  Experiments. 
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